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V  Abstract 


The  numerical  solution  of  discrete  approximations  to  the  first  bi¬ 
harmonic  boundary  value  problem  in  rectangular  domains  is  studied.  Several 
finite  difference  schemes  are  compared  and  a  family  of  new  fast  algorithms 
for  the  solution  of  the  discrete  systems  is  developed.  These  methods 

p 

are  optimal,  having  a  theoretical  computational  complexity  of  O(N^) 

p 

arithmetic  operations  and  requiring  N^+0(N)  storage  locations  when 
solving  the  problem  on  an  N  by  N  grid.  Several  practical  computer 

implementations  of  the  algorithm  are  discussed  and  compared.  These  im- 
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pi ementations  require  aN  +  bN  logN  arithmetic  operations  with  b«a. 

The  algorithms  take  full  advantage  of  vector  or  parallel  computers  and 
can  also  be  used  to  solve  a  sequence  of  problems  efficiently.  A  new 
fast  direct  method  for  the  biharmonic  problem  on  a  disk  is  also  developed. 
It  is  shown  how  the  new  method  of  solution  is  related  to  the  associated 
eigenvalue  problem.  The  results  of  extensive  numerical  tests  and  com¬ 
parisons  are  included  throughout  the  dissertation. 

It  is  believed  that  the  material  presented  provides  a  good  founda¬ 
tion  for  practical  computer  implementations  and  that  the  numerical  solu¬ 
tion  of  the  biharmonic  equation  in  rectangular  domains  from  now  on,  will 


be  considered  no  more  difficult  than  Poisson's  equation., 
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CHAPTER  I 


THE  CONTINUOUS  PROBLEM 


Let  fl  be  an  open  set  in 
following  problem: 


with  boundary 


3G. 


Consider  the 


A2u(x,y)  =  f(x,y)  (x,y)  e  G 

u(x,y)  =  g(x,y)  (x,y)  e  3ft 

un(x,y)  =  h(x,y)  (x,y)  e  3ft 


(1.1) 


where  u  denotes  the  exterior  normal  derivative  on  an. 
n 

This  thesis  will  develop  efficient  numerical  methods  for  the  above 

problem  when  n  is  a  rectangle  or  a  circular  disk.  The  algorithms  are 

2  2 
optimal,  requiring  0(N  )  arithmetic  operations  and  0(N  )  storage 

2 

locations  for  computing  an  approximate  solution  at  N  discrete  grid- 
points. 

In  this  Chapter  some  physical  problems  that  lead  to  equations  like 
(1.1)  will  be  described  together  with  a  few  mathematical  properties  rele¬ 
vant  for  the  construction  of  numerical  methods.  Discrete  approximations 
to  (1.1)  are  discussed  in  Chapter  II,  and  the  theory  behind  the  numerical 
algorithm  for  the  rectangular  domain  is  developed  in  Chapter  III.  Chap¬ 
ter  IV  discusses  the  implementation  of  numerical  algorithms  and  the  de¬ 
sign  of  computer  programs.  Some  numerical  results  for  a  few  applications 
of  the  algorithms  to  some  difficult  problems  are  presented  in  Chapter  V. 

Equation  (1.1)  is  called  the  (first)  Dirichlet  boundary  value  pro¬ 
blem  for  the  bi harmonic  operator 


2  2 
3x  3y 


(1.2) 
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and  this  problem  arises  in  several  fields  of  applied  mathematics.  Clas¬ 
sical  examples  occur  in  elasticity  theory  and  in  the  theory  of  fluid 
mechanics. 

In  linear  elasticity  u(x,y)  can  represent  the  Airy  stress  function 
or  as  in  the  theory  of  thin  plates,  the  vertical  displacement  due  to  an 
external  force.  In  the  latter  case  equation  (1.1)  represents  a  "clamped 
plate"  where  f  is  the  external  load.  Another  closely  related  case  is 
that  of  a  "supported  plate"  where  the  boundary  conditions  in  (1.1)  are 
replaced  by 


u(x,y)  =  g(x,y) 
oAu(x,y)  +  (l-o)unn(x,y)  =  h(x,y) 


(x,y)  e  an 
(x.y)  e  an 


(1.3) 


where  u  is  the  second  normal  derivative  and  a  is  a  material  constant 
nn 

called  Poisson's  ratio. 

When  n  is  a  polygon,  this  is  equivalent  to  a  problem  (with  data 
depending  on  o)  of  the  form: 


-A v  =  f  in  0 

v  =  -  h  on  an 

(1.4) 

-  Au  =  v  in  n 

u  =  g  on  an 


where  v  =  -  Au  has  been  introduced.  The  original  fourth  order  equation 
has  been  split  into  two  Poisson  problems.  There  exist  many  reliable  com¬ 
puter  programs  that  can  be  used  to  solve  (1.4)  in  an  efficient  way  both 
for  special  geometries  (Swarztrauber  and  Sweet  ^1 975j  ),  and  in  more  general 
domains  (Proskurowski  ^1978^ ) .  It  is  important  to  notice  that  the  only 
difference  between  (1.1)  and  (1.4)  is  that  different  boundary  data  have 


The  theory  of  thin  plates  allowing  large  vertical  displacements, 
leads  to  a  coupled  pair  of  nonlinear  equations  known  as  von  KSrmSns  equa¬ 


tions: 


A2u  =  ^u,vj  +  f 

in 

n 

U  = 

on 

3n 

un  =  hl 

on 

sn 

a2v  =  -  [u.u] 

in 

n 

V  -  g2 

on 

dn 

vn  =  h2 

on 

3Q 

where 


_  3^u  3^v  +  32u  32v  ?  32u  32v 

'  ax2  a,2  7?77  3* s* 3*  >y  • 


(1.5) 


Here  u  represents  the  vertical  displacement  of  the  plate,  v  is  the 
Airy  stress  function  and  f  is  the  external  force  on  the  plate.  An 
efficient  method  for  solving  linear  problems  involving  the  biharmonic 
operator  (with  the  appropriate  boundary  conditions)  can  be  very  valuable 
in  iterative  methods  for  solving  more  difficult  problems  of  this  type. 

References  describing  equations  involving  the  biharmonic  operator 
in  elasticity  include  Landau  and  Lifchitz  J^197ol ,  Muskhelishvili  [l963]  , 
Sokolnikoff  ^1946j ,  Kupradze  ^1965j  and  Kalandiya  ^1975]  .  More  recent 
texts  on  finite  elements  methods,  Strang  and  Fix  [l973]  ,  Zienkiewicz 
^1977j  and  Ciarlet  ^1978j  provide  additional  information. 

In  fluid  mechanics,  equation  (1.1)  describes  the  streanrfunction 
u(x,y)  of  an  incompressible  two-dimensional  creeping  flow  (Reynolds 
number  zero).  Efficient  numerical  methods  for  this  problem  can  also  be 
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problem  (1.1).  (Lions  and  Magenes  ^1972].) 

i i i )  Simplification  of  equation  (1,1). 

Assuming  that  (1.4)  can  be  solved,  there  is  no  loss  of  generality 
to  take  f  =  g  =  0  when  discussing  equation  (1.1).  This  follows  by 
letting  u  *  u^  +  u2  where  u^  solves  (1.4).  The  equation  for  u?  is 
then  of  the  desired  form. 

iv)  The  biharmonic  operator  under  a  conformal  coordinate  transformation. 
Assume  that  s  :  C  -*■  C  maps  a  region  Qz  in  the  z-plane  confor¬ 
mally  onto  a  region  in  the  w-plane,  and  let  Az  and  &w  denote 
the  Laplace  operators  in  the  two  regions.  Then 

A^u(z)  =  |s'(z)|2  Aw(|s‘(s_1(w))|2  Awu(s_1(w)))  .  (1.6) 

This  transformation  is  useful  when  the  map  s  from  a  simple  (computa¬ 
tional)  domain  nz  to  the  (physical)  domain  is  known  or  can  be  com¬ 
puted.  References  using  conformal  mapping  and  complex  variable  techni~ 
ques  include  Muskhel ishvi 1 i  ^ 1 953]  and  Kantorovich  and  Krylov  [l958]  . 

v)  Biharmonic  functions. 

Any  function  u(x,y)  satisfying  equation  (1.1)  with  f  =  0  is 
called  biharmonic.  Any  biharmonic  function  u  can  be  written 

u(z)  =  Re[z  <p  (z)  +  x(z)] 

where  <J>  and  x  are  analytic  functions.  Conversely,  given  <j>  and  x 
analytic,  the  above  expression  defines  a  biharmonic  function  u.  This 
representation  is  due  to  Goursat  ^1898]  .  If  ft  is  starshaped  and  u 
is  bi harmonic,  then 

? 

u  =  r  v  +  w  (1.7) 


|u(x,y)|  <  (24>(x,y)Jsmax{h(x,y)  | 

3  fl 

where  A<p  =  -  1  in  ft,  <$>  =  0  on  3n  . 

Extensions  of  this  result  to  the  case  where  g  is  nonzero  are  given 
by  Rosser  [l98o]  for  circular  and  rectangular  domains. 
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Another  type  of  a  priori  inequality  is  given  by 


dxdy  <_  ctj 


L 


(A2u)2dxdy  +  a- 


/  u2ds  +  a,  /  u2ds  +  a-  /  u?ds 

4  3  4  n  44 1 


Here  ut  is  the  tangential  derivative  and  ctp  a^,  and  a4  are  (In 
principle)  computable  constants  depending  on  the  domain.  This  inequality 
holds  for  any  sufficiently  smooth  function.  (Sigillioto  ^1976]). 
viii)  The  coupled  equation  approach. 

Consider  the  following  algorithm  for  solving  (1.1).  Let  A0  =  0, 
then  for  n  =  0,1,2,.. 

-  Avn  =  f  in  H 

v11  =  An  on  30 

-  Aun  =  vn  in  0 

un  =  g  on  30 

An+1  =  An  +  p(|^  -  h)  0  <  p  <  2/y 


where 


u  = 


max 


veH  (0)  HhJ(O) 
v  1  0 


/  |v/  ds/f| 
,  30  / J 0 


Av|  dxdy 


For  sufficiently  smooth  data  it  can  be  proved  that 


lim{un,vn}  =  {u,  -  Au} 
n 


see  MacLaurin  [l974j  or  Glowinski-Lions-Tremolieres  Jl976,  Chap.  4]. 
ix)  Relation  between  u„  and  Au  on  30  . 

Let  v  =  -Au  and  assume  that  A  =  v|3fi  is  known.  Equation  (1.1) 


-8- 


can  then  be  solved  as  two  decoupled  problems.  Let  A  denote  the  linear 
operator  mapping  A  to  up.  Glowinski  and  Pironneau  [l979]  proved  that 
A  is  a  symmetric,  strongly  elliptic  operator  mapping  H->s(3fl)  to 
This  is  the  basis  for  the  mixed  finite  element  method  they  propose  for 
problem  (1.1). 
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CHAPTER  II 

THE  DISCRETE  APPROXIMATION 

This  Chapter  will  discuss  discrete  approximations  to  the  continuous 
biharmonic  problem.  The  Chapter  consists  of  four  sections.  First,  a  few 
possible  finite  difference  schemes  will  be  introduced.  Necessary  modifi¬ 
cations  at  gridpoints  near  the  boundary  are  discussed  and  some  properties 
of  the  resulting  linear  systems  of  equations  are  mentioned. 

The  second  part  summarizes  known  theoretical  results  on  the  conver¬ 
gence  of  the  finite  difference  solution  to  that  of  the  continuous  problem. 
Several  conflicting  results  can  be  found  in  the  literature  and  the  re¬ 
view  of  this  material  is  intended  to  clarify  the  knowledge  of  this  sub¬ 
ject. 

The  algorithms  proposed  in  this  thesis  make  it  feasible  to  solve 
discrete  approximations  on  much  finer  grids  than  previous  methods  could 
handle  using  limited  computer  resources.  This  made  it  possible  to  perform 
fairly  extensive  tests,  solving  a  class  of  test  problems  over  a  wide  range 
of  grids  in  order  to  numerically  test  the  theoretical  convergence  rates 
and  compare  so.iie  of  the  proposed  approximations.  The  third  section  of  the 
Chapter  contains  a  summary  of  the  calculations  performed  and  some  of  the 
resulting  conclusions. 

The  last  section  contains  a  short  discussion  of  previously  proposed 
methods  for  solving  discrete  approximations  to  the  biharmonic  equation. 

2.1  Finite  difference  schemes. 

Most  finite  difference  schemes  proposed  for  the  biharmonic  equation 
have  only  been  applied  to  regions  made  up  of  unions  of  rectangles.  For 
more  general  regions  8ramble  [l966]  proposed  an  elegant  scheme  which 


..  *4  - 
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employs  the  13-point  stencil.  However,  due  to  the  difficulties  in  hand¬ 
ling  the  difference  approximation  close  to  a  curved  boundary,  more  general 
domains  should  usually  be  treated  within  the  framework  of  the  finite 
element  method.  The  study  of  finite  difference  schemes  and  the  efficient 
numerical  solution  of  the  resulting  equations  derived  from  regular  geo¬ 
metries  is  still  useful  for  at  least  two  reasons.  There  are  several  im¬ 
portant  problems  where  the  geometry  is  regular  or  where  it  is  convenient 
to  make  a  coordinate  transformation  from  the  physical  region  to  a  compu¬ 
tational  domain  with  a  simple  geometry.  In  addition,  efficient  numerical 
methods  for  regular  grids  can  contribute  to  the  development  of  fast  methods 
for  solving  finite  element  equations  resulting  from  triangulations  that 
are  regular  in  the  interior  of  a  more  general  region.  This  line  of  de¬ 
velopment  is  already  very  evident  in  the  work  of  Proskurowski  and  Widlund 
[l976] ,  [l98o]  on  second  order  elliptic  equations. 

The  following  discussion  will  be  restricted  to  a  rectangular  region 

R.  Let  R  be  covered  by  a  uniform  grid  such  that  the  boundary  of  R 

falls  on  gridlines.  This  is  illustrated  in  Figure  2.1,  which  also  de- 

★  • 

fines  three  disjoint  sets  of  gridpoints  ,  R^,  R^  and  Rh: 


Figure  2.1.  The  uniform  discretization  of  R. 
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Rh  =  {+)  »  the  set  of  all  gridpoints  having  only  interior  points  as  neigh¬ 
bors. 

Rh  *  {*}  ,  the  set  of  all  interior  gridpoints  having  at  least  one  neigh¬ 
bor  on  the  boundary. 

Rh  =  {•}  the  set  of  boundary  gridpoints. 


The  finite  difference  approximations  are  most  conveniently  described  us¬ 
ing  stencils.  For  example. 


1 

1  -4  1 

1 


u  =  Au 


+  — ( D4  + 
12iUl 


D?)u 


0(h4) 


(2.1) 


defines  the  usual  5-point  approximation  to  the  Laplace  operator  and  shows 

2 

that  this  approximation  has  a  local  truncation  error  of  order  h  . 

(Di  =  jj-  ,  i  =  1  or  2).  The  classical  13-point  approximation  for  the 
bi harmonic  operator  is  most  easily  derived  by  applying  the  above  5-point 
operator  twice: 


u  =  A5(A5u) 


1 

2 

-8 

2 

1  -8 

20 

-8 

1 

2 

-8 

2 

1 

A2u 


+  f(D? 


oJd|  + 


2  4 

°1D2  + 


0*)u 


0(h4) 


(2.2) 


(The  operator  A5U5  u)  is  formed  by  first  forming  a^(a  u)  and 
then  substituting  for  u  using  2.1).  Unlike  the  discrete  Laplace  opera¬ 
tor  which  can  be  applied  to  all  interior  gridpoints  P  e  RhUR^»  this 


-12- 


operator  is  only  well  defined  on  the  points  P  e  Rh*  Two  alternative 

2  * 
approximations  of  A  u(P)  will  be  considered  for  P  t  R^. 

i)  Quadratic  extrapolation. 

Use  the  normal  derivative  boundary  condition  at  the  point  Qe 
nearest  P  £  to  formally  get  a  local  0(h  )  accurate  extrapolated 
value  at  the  "missing"  (exterior)  point  in  the  stencil.  This  results  in 
a  stencil  of  the  form 


Aq2u(P)  i  K 


1 

2 

1 

00 

2 

co 

i 

21 

00 

1 

1 

C\J 

CO 

1 

2 

1 

u(P)  +  2hun(Q) 


A2u(P)  +  0 ( h-1 ) 


(2.3) 


when  applied  to  a  point  P  e  near  the  left  boundary.  Notice  that  un 
always  denotes  the  exterior  normal  derivative  evaluated  at  the  boundary. 

A  similar  procedure  (eliminating  two  exterior  points)  when  P  e  R^  is 
a  cornerpoint  results  in  a  weight  of  22  at  the  center  point  of  the  sten¬ 
cil. 

i i )  Cubic  extrapolation. 

Using  the  same  approach  as  in  i),  but  performing  a  cubic  line-extra- 
polation  results  in  an  0(h  )  accurate  approximation  of  the  "missing" 
point.  The  discrete  biharmonic  operator  becomes 


1 

2 

-8 

2 

-8 

23 

-8.5  1 

2 

-8 

2 

1 

u(P)  +  3hun(Q)  -  |  u(Q) J  =  A2u(P)+0(l) 

(2.4) 
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★ 

at  a  point  Pe  ^  near  the  left  boundary.  Notice  that  this  leads  to  an 
unsymmetric  coefficient  matrix. 

Gupta  [l975]  considered  two  families  of  boundary  approximations  of 
the  above  form,  but  depending  on  integer  parameters  indicating  which  in¬ 
terior  points  to  use  in  the  extrapolation.  His  iterative  method  converged 
faster  if  points  further  away  from  the  boundary  were  chosen.  This  clearly 
results  in  larger  truncation  errors.  Since  the  algorithms  in  this  thesis 
can  handle  the  approximations  that  furnish  the  smallest  truncation  errors, 
only  these  two  choices  will  be  considered.  (The  quadratic  and  cubic  ex¬ 
trapolation  near  the  boundary  is  equivalent  to  the  schemes  p  =  1  and 
p  =  2,  q  =  1  respectively  in  the  notation  of  the  above  author.) 

Glowinski  [l973]  made  the  observation  that  the  13-point  finite  dif¬ 
ference  scheme  combined  with  quadratic  extrapolation  near  the  boundary 
is  equivalent  to  solving  the  biharmonic  equation  using  a  mixed  finite 
element  method  and  piecewise  linear  elements  in  the  triangulation  shown 
in  Figure  2.2. 


Figure  2.2.  Finite  element  triangulation  corresponding 
to  the  13-point  difference  stencil. 


-  -  - 


There  are  many  alternative  finite  difference  approximations  that  can 
be  derived  for  the  biharmonic  operator.  By  rotating  the  coordinate  sys¬ 
tem  tt/4  the  approximation 

2 

u  =  Au  +  y£(dJ  +  6d|d|  +  D4)u  +  0(h4)  (2.5) 


to  the  Laplace  operator  follows  from  the  5-point  stencil  given  earlier. 
From  this  operator  an  alternative  13-point  approximation  is  obtained: 


This  stencil  is  not  as  convenient  since  it  depends  on  twice  as  many  points 
at  a  distance  2h  from  the  center  point.  Combining  the  two  approxima¬ 
tions  to  the  Laplacian  results  in  a  17-point  approximation. 


(2.7) 


2 

=  A2  u  +  -g-(D®  +  4  oJd2  +  4  D2D4  +  D®)u  +  0(h4)  . 

By  taking  suitable  linear  combinations  of  the  above  stencils  for  the  bi- 
harmonic  operator  it  is  possible  to  derive  approximations  that  can  be 
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used  to  construct  higher  order  schemes.  For  example 


(j  +  |  a[7)u  =  A2  u  +  jj-  A3  u  +  0(h4)  . 


(2.8) 


Combining  this  with  the  idea  of  also  forming  differences  on  the  right 
hand  side  of  the  equation  (Mehrstellenverfahren,  Collatz  [l955]  )  yields 
a  locally  fourth  order  accurate  approximation  that  has  been  studied  by 
Zurmuhl  [l957]  : 


1 

7 


1 

1 

1 

1 

-2 

-10 

-2 

1 

1 

-10 

36 

o 

r— 1 

1 

1 

1 

-2 

-10 

-2 

1 

1 

1 

1 

u  +  0(h4) 


•  (2.9) 


Zurmuhl  derives  rather  complicated  stencils  that  can  be  applied  to  points 

it  O 

P  e  Rh  having  local  truncation  error  0(h  ).  Based  on  the  material  in 
this  Chapter  (Section  2.2  and  2.3)  it  is  likely  that  less  accurate  approxi 
mations  near  the  boundary  would  be  sufficient. 

It  should  be  noticed  that  all  the  linear  systems  of  equations  de¬ 
rived  from  the  above  stencils  (ignoring  the  irregularity  caused  by  the 
special  boundary  approximations)  can  be  efficiently  solved  using  for  ex¬ 
ample  the  fast  Fourier  transform  (Henrici  [l979]). 

The  systems  of  linear  equations  derived  from  the  finite  difference 

approximations  discussed  so  far,  are  all  positive  definite  with  a  condi- 

-4 

tion  number  proportional  to  h  .  Due  to  the  special  approximations  used 

★ 

for  the  points  P  E  Rh,  the  matrices  are  often  not  symmetric,  but  they 
can  usually  be  considered  as  perturbed  symmetric  matrices. 

The  matrices  do  not  possess  property  A  (Young  |l972j).  Finite  dif¬ 
ference  approximations  of  the  biharmonic  operator  that  lead  to  linear 


systems  having  property  A  have  been  considered  by  Tee  J^l 963J  and  Tang 
Jl964j  for  rectangular  and  hexagonal  meshes  respectively.  Proper  treat¬ 
ment  of  the  points  near  the  boundary  remains  a  serious  problem  with  these 
schemes  since  inaccurate  boundary  approximations  must  be  used  in  order 
not  to  destroy  the  matrix  structure.  (These  matrices  have  interesting 
block  properties;  see  Parter  ^1959],  Chapter  14  of  Young  ^1972j,  Suzbee, 
Golub  and  Howell  [l977].) 

2.2  Discretization  error  estimates. 

There  exist  very  few  papers  in  the  literature  discussing  the  global 
discretization  errors  of  the  finite  difference  schemes  presented  in  the 
previous  section.  This  is  in  contrast  to  the  case  of  second  order  ellip¬ 
tic  problems  where  the  theory  is  well  understood.  The  main  reason  for 
this  is  probably  the  fact  that  there  is  no  maximum  principle  for  higher 
order  equations,  while  the  maximum  principle  valid  both  in  the  continu¬ 
ous  and  discrete  case  for  second  order  problems  provides  an  important  tool 
for  the  analysis. 

The  first  results  proving  that  the  13-point  approximation  (2.2)  con¬ 
verges  to  the  solution  of  the  continuous  problem,  was  given  by  Courant, 
Friedrichs  and  Lewy  ^1928j .  The  main  references  for  this  section  are 
Bramble  ^1966]  and  ZlSmal  [l967j  .  The  more  recent  analysis  by  Gupta  [l975j 
is  based  on  the  above  paper  by  ZlSmal,  but  some  of  the  discretization 
error  estimates  given  can  be  improved. 

Let  v  denote  a  function  defined  on  each  gridpoint  P(x,y)  e  R 
(R  a  Rh^Rh^^h^  and  extended  the  value  zero  outside  R.  The  fol¬ 
lowing  norms  and  notation  will  be  used  in  this  section: 
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vx(x,y)  =  £(v(x  +  h,y)  -  v(x,y)) 

\x(x’y)  E  H(vx(x’y>  -  vx(x  *  h’y>> 
and  similarly  for  and  v^y. 


(2.10) 


!|v  1 1 q  =  h2  I  v(P)2 

U  PeR 

Hv  Ilf  =  IMIq+  II v  x  II  0  +  ilvy  II 0  (2-U) 

llv  111  E  II V  II |  +  ||vx  ||f  +||Vy||2  . 

__  ★ 

The  norms  of  the  restriction  of  v  to  points  P  e  or  P  e  are 

defined  by: 


llv 


h  E  v(P)2 
PeRh 


v  II 


h2  E  v (P)2 
P£Rh 


(2.12) 


The  norms  containing  the  discrete  derivatives  are  defined  in  a  similar 
way  as  above. 

The  following  lemma  holds  for  any  mesh  function  v  vanishing  out¬ 
side  R. 

Lemma  2.1  (Discrete  a  priori  inequalities.) 

(1)  llv  l!0  i  C<  HVX  II  0  *  HVy  M  0)>S 

(H)  max  |»(P)|  <  c|log  h-1!^  ||vx  ||j  +  l|v  llg)*5 

(iii)  l|v|l1lc(h-1|iv||0>R*+  ||A23  v||0>Rh) 


for  some  constant  c  independent  of  h. 

This  is  proved  in  Bramble  ^1966],  an  alternative  proof  of  ( i i i )  is 


r 
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given  by  Kuttler  [l97lj  . 

Lemma  2.2  (Extrapolation  near  the  boundary). 

Let  v  be  any  mesh  function  vanishing  outside  R.  If  the  approxi¬ 
mation  defined  by  (2.2)  and  (2.3)  is  used,  the  following  estimates  hold: 


<>  II*  ll0,R*  *  h3/2  II*  Ho  i  c  h3/2l"5/2  K  *  II o,R*  +  H413  *  ll0.R. 

h  nr 


id 


|v||2  <  c(hs  I)a|  V  |||>R.  ♦  114^3  v  llS.Rh)‘S 


for  some  constant  c  independent  of  h. 

Inequality  i)  can  be  found  in  Kuttler  |l97lj  ,  ii)  is  proved  by  Zlamal 
^1967]  .  Inequality  ii)  also  holds  if  Aq  is  replaced  by  (2.4),  it 
can  therefore  be  used  to  analyze  the  case  where  cubic  extrapolation  is 
used  near  the  boundary. 

Let  uh  denote  the  solution  to  a  finite  difference  approximation  of 
the  first  biharmonic  boundary  value  problem  using  one  of  the  discretizations 
defined  in  the  previous  section.  Let  u  be  the  continuous  solution  of 
the  problem  and  assume  that  u  G  C^(R).  Lenina  2.1  and  2.2  can  now  be 
used  to  estimate  the  global  discretization  error. 


Theorem  2.1 

O 

Assume  the  13-point  operator  is  used  on  R  and  let  c  de¬ 

note  some  constant  independent  of  h.  Then 

i)  max|u.(P)  -  u(P)|<  c |  log  h"*!*5  h2 
PeR  n 

ii)  ||uh(P)  -  u(P)  ||:  lc  h2 

iii)  ||uh(P)  -  u(P)  ||2  <  c  h3/2 
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iv)  ||uh(P)  -  u(P)||0  R*  £  c  h3 

'  h 

2  * 

if  the  quadratic  extrapolation  scheme  a  is  used  on  R.  ,  and 

q  h 

v)  max  |u. (P)  -  u(P)|  <  c  h2 

PeR  " 

vi)  max(|(u.(P)  -  u(P) )  j  +  |(uh(P)  -  u(P))  |)  <  c  |  log  h-1!5*  h2 

PeR  x  n  y 

vii)  ||uh(P)  -  u(P)  |l2  <  c  h2 

if  the  cubic  extrapolation  scheme  ac  is  used  on  R^. 

Proof : 

2  2  2 

The  local  truncation  error  of  is  0(h  ),  while  A^  has  local 

-1  2 

truncation  error  0(h  )  and  ac  is  0(1)  (see  section  2.1).  Using 

this  and  i)  of  Lemma  2.2  gives  iv).  Combining  this  with  ii)  and  iii)  in 
Lemma  2.1  gives  i)  and  ii).  Statement  iii)  and  vii)  follows  from  ii)  of 
Lemma  2.2.  Using  vii)  and  ii)  of  Lemma  2.1  gives  vi).  Finally,  the  dis¬ 
crete  Sobolev  inequality  (Sobolev  ^194o] )  applied  to  vii)  proves  v). 
Remarks: 

i)  The  above  results  i)  -  iv)  still  hold  in  a  more  general  region  with 
curved  boundary  using  a  suitable  generalization  of  the  quadratic  extra¬ 
polation  scheme.  (Bramble  [l966j,  Zl£mal  ^1967]). 

ii)  It  is  unsatisfactory  that  Lemma  2.1  and  2.2  involve  specific  dis¬ 
cretizations.  More  general  proofs  would  make  it  possible  to  estimate  the 
global  discretization  error  of  a  given  finite  difference  scheme  from  the 
local  truncation  errors.  Both  theoretical  and  computational  evidence  make 
it  reasonable  to  believe  that  the  lemmas  hold  for  a  wider  class  of  approx¬ 


imations. 
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3 

iii)  Notice  that  the  error  near  the  boundary  estimated  in  iv),  is  0(h  ), 
an  order  of  magnitude  smaller  than  the  overall  error  despite  a  local  trun- 
cation  error  of  0(h)  when  using  A^j  u(P)  for  PeR^.  Strang  and  Fix 
^1973,  p.  202]  discusses  a  similar  phenomenon  for  second  order  equations. 

iv)  The  only  important  difference  between  quadratic  and  cubic  boundary 
approximation  is  the  estimates  for  the  discrete  second  derivatives  (iii) 
and  (vii).  This  can  be  significant  in  several  applications  where  Au 
represents  an  essential  physical  quantity.  (See  also  section  2.3). 

v)  Gupta  ^1975 J  used  the  discrete  Sobolev  inequality  on  iii)  and  obtain- 

3/2 

ed  the  weaker  result  0(h  )  instead  of  i). 

vi)  Some  of  the  estimates  in  Theorem  2.1  do  not  seem  to  be  sharp,  see  the 
numerical  evidence  in  section  2.3. 

vii)  If  u<k>  is  an  eigenvector  of  the  discrete  bi harmonic  operator  de- 

(kl 

fined  using  quadratic  extrapolation  near  the  boundary,  and  A^  '  is  the 
corresponding  eigenvalue,  then 

max|u^k^(P)  -  u<k>|  <  c  |  log  h"1^  h2 
PeR  n 

|*W  -  x<k>|  <  c  h2 

provided  the  exact  eigenfunction  u^  e  C^.  This  result  is  due  to 
Kuttler  [l97l]. 

2.3  Numerical  study  of  discretization  errors. 

In  this  section  the  results  of  numerical  calculations  using  the 
finite  difference  methods  from  2.1,  are  compared  with  the  theoretical 
results  of  2.2.  The  new  fast  computer  algorithms  developed  in  Chapter 
III  and  described  in  more  detail  in  Chapter  IV,  make  it  possible  to  solve 
problems  for  a  wide  range  of  grids.  The  asymptotic  behavior  of  the 


■  A 
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discretization  error  as  h  tends  to  zero  can  then  be  investigated.  The 
10  testproblems  listed  in  Appendix  III  were  used.  Scattered  results  for 
several  of  these  problems  on  coarse  grids  can  be  found  in  the  literature 
(see  Appendix  III).  Each  problem  was  solved  in  the  unit  square 
0  £  x,y  <_  1,  using  a  uniform  grid.  The  results  will  be  presented  in 
tables  like  the  one  below. 


Solution 

First 

derivative 

Second 

derivative 

Max 

L2 

Max 

L2 

Max 

L2 

nw  ^2 

R1 

R2 

R3 

R4 

R5 

R6 

where  h^  and  h^  specifies  two  different  grids,  and 


R,  = 


max|u(P)  -  u.  (P) | 

log(f£S - !*_  , 

maxju(P)  -  u.  (P)| 
PeR _ n2 

hl 

log(Tp) 

n2 


If  the  discretization  error  behaves  like 

a,  a- 

max|u(P)  -  uh(P)|  "Cj  h  +  Cg  h  +  ...  (o^  >  o^) 

PeR 

then  R^  will  represent  a  computed  approximation  to  (assuming 
c^h  1  »  C£h  z).  Define  e(P)  =  u(P)  -  uh(P)  for  PeR. 

R..,  i  =  2, 3, 4, 5, 6  is  then  defined  in  the  same  way  as  Rj  using  the  fol 
lowing  norms: 


*3  :  •  'VP)I] 
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*4  :hlpfR  (e*(P)?  +  ey(p)2>]‘S 

R5  :J«[leo<l>>l  ’  leyy(p)l  -  I*xy(p)l] 

R6  :  h[p^R(exx(p>2  *  VP)2  *  2  exy(p)2)]'i  ■ 

Note  that  the  discrete  derivatives  of  the  error  e(P); 

ex(p),  ey(P>,  e„x(P),  eyy(P>  and  exy(P) 

2 

have  been  formed  using  centered  (0(h  )  accurate)  differences.  The  same 

★ 

norms  will  be  used  when  considering  the  boundary  layer  P  e  Rh  except  that 

c 

the  factor  h  is  replaced  by  h^  in  R^,  R^,  and  Rg. 

Remark . 

The  discrete  derivatives  formed  by  centered  differences  of  the  com¬ 
puted  pointwise  error  in  the  solution  have  been  computed.  An  alternative 
would  be  to  compare  the  finite  difference  approximations  obtained  from  the 
computed  solution  with  the  exact  derivatives.  The  two  methods  give  the 
same  information  as  long  as  R-  £  2.  Since  the  discretization  error  due 
to  the  finite  difference  approximation  of  the  continuous  problem  is  best 
studied  using  the  first  method,  only  these  results  are  presented. 

First,  the  13-point  formula  combined  with  quadratic  extrapolation  will 
be  considered.  Problem  1  is  solved  exactly  by  the  method,  results  for 
problems  2,  7  and  10  are  given  in  Figure  2.3. 


Solution 

First 

derivative 

Second 

derivative 

Problem  2 

Max 

L2 

Max 

L2 

Max 

L2 

0.1  /  0.05 

1.96 

1.97 

HI 

1.09 

1.58 

0.05  /  0.025 

1.98 

1.99 

1.02 

1.72 

0.025  /  0.0125 

2.00 

2.00 

1.85 

1.94 

1.00 

1.80 

0.0125  /  0.00625 

2.00 

2.00 

1.92 

1.97 

1.00 

1.84 

Problem  7 

0.1  /  0.05 

1.95 

1.95 

1.25 

1.69 

0.95 

1.45 

0.05  /  0.025 

1.98 

1.99 

1.61 

1.85 

0.97 

1.64 

0.025  /  0.0125 

2.00 

2.00 

1.77 

1.92 

0.98 

1.74 

0.0125  /  0.00625 

2.00 

2.00 

1.86 

1.96 

0.99 

1.80 

Problem  10 

0.1  /  0.05 

1.95 

1.97 

1.38 

1.74 

1.02 

1.56 

0.05  /  0.025 

1.99 

2.00 

1.70 

1.87 

0.99 

1.70 

0.025  /  0.0125 

2.00 

2.00 

1.83 

1.93 

0.99 

1.78 

0-0125  /  0.00625 

2.00 

2.00 

1.89 

1.97 

1.00 

1.83 

Figure  2.3.  Computed  discretization  error  estimates  for 
problem  2,  7  and  10  using  the  quadratic 
boundary  approximation. 


None  of  the  other  test  problems  had  convergence  rates  significantly 
slower  than  the  ones  listed  above.  Improved  rates  were  observed  for 
problems  4,  5  and  9  where  Rg  s  Rg  z  2.  Figure  2.4  gives  the  rate  of 
convergence  of  the  solution  at  the  points  P  e  Rh  for  problem  7. 


1 
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Problem  7 

Solution 

First 

derivative 

Second 

derivative 

P  £  R* 

u 

Max 

Max 

L2 

Max 

L2 

0.1  /  0.05 

2.56 

2.53 

m 

■ 

0.95 

1.11 

0.05  /  0.025 

2.76 

2.74 

■Sill 

0.97 

1.32 

0.025  /  0.0125 

2.85 

2.85 

1.77 

1.78 

0.98, 

1.40 

0.0125  /  0.00625 

2.91 

2.92 

1.86 

1.87 

0.99 

1.45 

Figure  2.4  Problem  7.  Convergence  of  boundary 

layer  R^  using  the  quadratic  boundary 
approximation. 

Improved  convergence  at  the  points  P  e  R*  was  again  observed  for 
problems  4  (R^  ~  R^  2  2),  5  and  9,  where  R^  =  R^  =  4,  Rj  s  R^  :  3, 

and  Rj  «  Rg  :  2, 

Based  on  the  numerical  evidence,  the  discretization  error  estimates 
given  in  Figure  2.5  are  believed  to  be  correct  for  smooth  functions. 
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Figure  2.5  Asymptotic  behavior  of  discretization  errors. 


The  entries  in  Figure  2.5  marked  with  a  star  correspond  to  estimates 


that  are  sharp  in  Theorem  2.1.  In  addition,  the  estimate  for  the  maximum 
error  in  the  solution  over  R  is  almost  sharp.  A  specific  numerical 


-1  h 

calculation  strongly  suggested  that  the  factor  | 1 og  h  y  in  Theorem 
2.1  can  be  removed. 

The  discretization  error  in  the  second  derivatives  is  of  particular 
interest.  The  second  derivatives  represent  important  physical  quantities 
both  in  the  theory  of  elasticity  and  in  fluid  applications.  The  calcu¬ 
lations  clearly  indicate  that  the  maximum  error  is  proportional  to  h. 
Theorem  2.1  states  that  the  l_2  error  is  bounded  by  h3^2,  and  Zlamal 
|l967  suggests  that  this  is  in  fact  sharp.  However,  the  numerical  re- 

vn 

suits  indicate  that  the  rate  is  h  .  A  possible  explanation  of  this 

-1/2 

rate  would  be  a  boundary  layer  of  thickness  0(h  '  )  with  errors  of 
3/2  2 

0(h  )  and  interior  errors  of  0(h  ).  A  special  calculation  showed 

2 

that  the  error  in  the  interior  indeed  behaved  like  0(h  ).  The  value  of 
Rg  is  1.86,  1.86  and  1.84  for  problems  2,  7  and  10  when  using 

hl  =  2^0  and  h2  =  256  consistent  with  the  value  V3.5  =  1.8708.  (Con- 

2 

vergence  is  slow  since  the  next  term  in  the  error  expansion  (ch  )  is 
only  slightly  smaller.) 

The  very  regular  behavior  of  the  truncation  error  when  using  qua¬ 
dratic  extrapolation  near  the  boundary  suggests  that  Richardson  extrapo¬ 
lation  will  be  quite  effective.  Figure  2.6  displays  the  corresponding 
results  of  problem  7  after  one  Richardson  extrapolation.  (Using  h  and 
h/2). 
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Figure  2.6  Problem  7.  One  Richardson  extrapolation. 

It  should  be  pointed  out  that  a  precise  knowledge  of  the  asymptotic  dis¬ 
cretization  error  is  vital  when  doing  Richardson  extrapolation.  Gupta 
|l979^  reports  that  the  maximum  error  in  the  solution  of  problem  5  de¬ 
creases  from  0.07  with  h  =  0.05  to  0.05  when  extrapolating  using 

h  =  0.1  and  h  =  0.05.  His  extrapolation  was  based  on  an  expansion  with 
3/2 

a  leading  term  h  ,  if  the  correct  extrapolation  is  performed  the  er¬ 
ror  decreases  from  0.07  to  0.002. 

Next,  consider  the  discretization  errors  when  using  the  13-point 
formula  in  combination  with  cubic  extrapolation  near  the  boundary.  Pro¬ 
blems  1  and  3  are  solved  exactly  by  this  approximation.  Figure  2.7  shows 
the  results  for  problems  2,  7  and  10,  while  Figure  2.8  shows  the  rate  of 
convergence  near  the  boundary  for  problem  7. 
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Figure  2.7  Computed  discretization  error  estimates 
for  problems  2,  7  and  10  using  the  cubic 
boundary  approximation. 
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Figure  2.8  Problem  7.  Boundary  layer  Rh  using 
cubic  boundary  approximation. 


The  behavior  is  not  as  consistent  as  in  the  previous  case,  making 

Richardson  extrapolation  less  attractive.  Notice  that  the  error  in  the 

second  derivative  near  the  boundary,  is  converging  at  a  much  better  rate 

than  in  the  first  case.  The  theory  in  section  2.2  indicates  that  this 

2 

method  should  be  0(h  )  in  both  solution,  first  and  second  derivative. 
The  rate  of  convergence  of  the  solution  at  the  boundary  Rh  would  be 
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3.5  if  Lemma  2.2  applied. 

The  4th  order  approximation  {2.9)  with  boundary  formulas  taken  from 
Zurmuhl  [l957]  was  tried  with  h  =  J(J  •  J5  and  ^0*  Again  problems  1  and 
3  are  solved  exactly  by  the  approximation.  Results  for  problem  7  are 
given  in  Figure  2.9. 
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Figure  2.9  Problem  7.  A  4th  order  accurate  method. 

Despite  fairly  large  variations  in  the  computed  rates  of  convergence, 

all  errors  are  reduced  by  a  factor  of  four  or  more  indicating  that  the 

method  is  fourth'  order  accurate.  If  Lemma  2.1  and  2.2  were  valid  in  this 

★ 

case,  then  the  results  for  the  boundary  layer  Rh  would  be  5.5.  Per¬ 
haps  more  important,  the  complicated  approximation  formulas  used  near  the 

boundary  may  not  be  necessary.  An  approximation  of  0(h)  would  probably 

2 

suffice  in  order  to  get  accurate  function  values,  while  0(h  )  may  be 

4 

necessary  to  get  0(h  )  accuracy  also  for  the  second  derivatives. 

The  results  given  above  reflect  the  strong  smoothing  properties  of 
the  bi harmonic  operator.  The  errors  in  the  interior  behave  nicely  even 
for  higher  discrete  derivatives.  Close  to  the  boundary  the  situation  is 
much  more  complex,  a  boundary  approximation  which  is  3  orders  less  accu¬ 
rate  than  the  interior  approximation  is  sufficient  in  order  to  obtain 
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good  convergence  for  the  solution  and  the  first  derivative,  if  the  approxi¬ 
mation  is  2  orders  less  accurate,  then  the  second  derivatives  also  con¬ 
verge  at  an  optimal  rate. 

In  order  to  compare  the  relative  accuracy  of  the  four  schemes  dis¬ 
cussed  in  this  section.  Figure  2.10  displays  the  actual  error  (and  the 
various  centered  differences  of  the  error)  for  problem  7. 

Figure  2.10  indicates  that: 

1)  The  cubic  boundary  extrapolation  produces  more  accurate 
results  than  the  quadratic  approximation  on  a  given  grid, 
ii)  Richardson  extrapolation  is  very  effective  when  using  the 
quadratic  boundary  approximation, 
iii)  On  smooth  problems  like  the  ones  considered  here,  the 
fourth  order  method  produces  excellent  results. 

Before  closing  this  section,  the  importance  of  a  good  set  of  test 
problems  should  be  mentioned.  This  study  revealed  many  cases  where  one 
or  more  terms  in  the  (unknown)  error  expansions  dropped  out  for  a  given 
problem.  In  particular  problems  4,  5  and  9  are  rather  special  and  give 
atypical  results.  (These  problems  have  been  considered  by  several  authors 
in  the  past.)  In  many  applications  the  problems  will  be  less  smooth  than 
the  above  test  problems.  In  such  cases  a  fine  grid  calculation  with  a 
second  order  accurate  method  is  likely  to  be  more  satisfactory  than  a 
high  order,  coarse  grid  calculation. 


-31- 


2.4  A  brief  survey  of  other  methods. 

A  large  number  of  papers  proposing  numerical  algorithms  for  the  ap¬ 
proximate  solution  of  the  continuous  problem  (1.1)  have  appeared  in  the 
literature.  The  rapid  development  of  increasingly  faster  computers  in  the 
last  two  decades  has  made  it  feasible  to  actually  solve  finite  difference 
approximations  to  the  biharmonic  equation  proposed  and  theoretically  in¬ 
vestigated  as  early  as  1928  in  the  important  paper  by  Courant,  Friedrichs 
and  Lewy. 

Today,  there  is  a  considerable  interest  not  only  in  the  various  dis¬ 
crete  approximations  of  a  given  continuous  problem,  but  also  in  the  com¬ 
putational  complexity  of  the  discrete  problem  itself.  The  solution  of 
the  discrete  Poisson  equation  is  a  good  illustration.  In  the  last  fifteen 
years  many  efficient  numerical  methods  have  been  developed.  (Hockney 
1965  ,  Buneman  [l969] ,  Buzbee,  Golub  and  Nielson  |l97oj.  Bank  and  Rose 
1977  and  Schroder,  Trottenberg  and  Witsch  Jl978].)  When  solving  the 
problem  on  an  N  by  N  grid  0(N  )  arithmetic  operations  and  0(N  ) 
storage  is  needed.  A  method  having  this  complexity  is  said  to  be  opti¬ 
mal.  (Actual  computer  implementations  often  make  use  of  the  fast  Fourier 

transform  or  the  idea  of  cyclic  reduction  resulting  in  nearly  optimal 

2 

methods  having  an  operation  count  of  0(N  logN).)  These  methods  can  all 
be  viewed  as  efficient  computer  implementations  of  the  separation  of 
variables  technique. 

However,  separation  of  variables  cannot  be  applied  to  the  biharmonic 
problem  (1.1).  The  methods  proposed  in  earlier  papers,  for  the  solution 
of  the  discrete  problem  that  arises  when  using  the  13-point  stencil  have  not 
been  optimal.  The  main  result  of  the  next  chapter  is  to  show  that  a 
numerical  method  of  optimal  complexity  does  exist,  even  though  the  matrix 
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corresponds  to  a  nonseparable  problem.  (It  seems  however,  that  optimal 
numerical  methods  for  this  type  of  problems  do  require  the  use  of  an 
iterative  process.) 

The  methods  that  have  been  proposed,  for  solving  the  linear  system 
of  equations 

Ax  =  b 

derived  from  the  13-point  stencil  can  roughly  be  classified  as  follows: 

i)  Iterative  methods  working  on  the  matrix  A. 

ii)  Direct  methods  working  on  the  matrix  A. 

iii)  Iterative  methods  based  on  reducing  the  biharmonic  problem 
to  a  coupled  system  of  two  second  order  equations  involving 
the  Laplace  operator. 

iv)  Direct  methods  taking  advantage  of  the  fact  that  A  can  be 

2 

split  into  L  +  V,  where  L  is  the  discrete  Laplace  operator 
and  V  has  low  rank. 

The  first  approach  i)  can  be  found  in  many  early  papers  on  the  sub¬ 
ject;  Parter  |^1959j  and  Conte  and  Dames  ^196C)j.  A  more  recent  paper  using 
a  strongly  implicit  scheme  is  Jacobs  ^1973^.  The  main  disadvantage  of 
approach  i)  is  related  to  the  fact  that  A  has  condition  number  propor¬ 
tional  to  resulting  in  slow  convergence  of  the  iterative  techniques. 
(Munksgaard  ^1 980~j  reports  that  more  than  500  conjugate  gradient  itera¬ 
tions  are  required  already  for  N  =  32.) 

Approach  ii)  has  recently  received  more  attention  due  to  a  better 
understanding  of  sparse  methods  for  Gauss  elimination.  The  theoretical 

3 

complexity  of  a  direct  method  using  nested  dissection  is  0(N  )  arith- 

2 

metic  operations  and  0(N  logN)  storage  locations.  Nested  dissection 
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and  other  sparse  matrix  methods  for  the  problem  were  studied  by  Sherman 

[l975].  His  results  indicate  that  the  constants  in  the  above  estimates 

are  quite  large  and  that  a  regular  band  solver  (0(N  )  work,  0(N  ) 

storage)  is  competitive  even  when  the  number  of  unknowns  approach  one 

thousand.  Bauer  and  Reiss  |l972]  proposed  a  block  elimination  scheme, 

while  Gupta  and  Manohar  Jl979]  used  a  band  solver.  Both  these  methods 

require  a  prohibitive  amount  of  storage  if  N  is  large  and  they  have  a 

4 

typical  running  time  proportional  to  N  ,  unacceptable  for  fine  grid  cal¬ 
culations. 

The  third  and  fourth  approach  are  essentially  two  different  ways  of 
looking  at  the  same  underlying  problem.  A  method  based  on  iii)  above  was 
introduced  by  Smith  |l968] .  It  had  a  running  time  of  0(N3).  This  was 
later  improved  to  0(N5,/2 )  by  Smith  [l97oj  ,  |l973j  ,  Ehrlich  [l97l]  .  (See 
also  Ehrlich  and  Gupta  [l975 J . )  A  drawback  is  the  need  to  estimate  itera¬ 
tion  parameters.  Recently  VajterSic  ^197 9J  presented  a  more  efficient 
implementation  of  these  ideas,  but  the  complexity  of  the  method  remained 
0(N5/2). 

The  last  appraoch  iv)  was  pioneered  by  Golub  [l97lj  and  a  refined 
implementation  is  given  by  Buzbee  and  Dorr  Jl974j .  This  implementation, 
which  is  a  direct  method,  requires  0(N3)  arithmetic  operations.  Des- 
pite  being  an  0(N  )  method  it  proved  very  competitive  with  the  0(N  ) 

methods  on  realistic  problems  because  those  methods  have  an  actual  cost  of 

5/2  3 

c  N  with  c  substantially  larger  than  the  constant  in  the  0(N  ) 

estimates. 

Based  on  the  above  results  Sameh,  Chen  and  Kuck  |l976j  concluded  that 
the  solution  of  the  first  biharmonic  problem  was  an  order  or  magnitude 


computers.  The  results  of  this  thesis  show  that  the  problems  have  the 
same  complexity. 

There  are  many  alternative  ways,  not  using  finite  differences,  for 
obtaining  an  approximate  solution  to  the  biharmonic  equation.  A  few  will 
be  mentioned  here. 

i)  Finite  element  methods. 

An  extensive  literature  exists.  The  methods  of  solution  are  most 
often  sparse  Gaussian  elimination.  Recent  contributions  proposing 
alternative  ways  of  solving  the  resulting  linear  equations  include 
Axelsson  and  Munksgaard  [l979]  and  Glowinski  and  Pironneau  1 1 97 9^ - 

ii)  Least  squares  methods. 

Methods  of  this  type  are  often  called  "point  matching  methods"  in 
the  engineering  literature  while  the  name  "method  of  particular 
solutions"  sometimes  is  used  by  numerical  analysts.  This  approach 
can  be  very  effective  for  special  problems.  References  include 
McLaurin  |l968j,  Sigillito  [l976]  and  Rektorys  Jl979j  . 

iii)  Integral  equation  methods. 

A  large  number  of  papers  have  appeared  and  the  theoretical  founda¬ 
tion  is  well  understood.  (See  the  references  given  in  Chapter  I). 

A  few  recent  papers  are  Katsikadelis  |l977l,  Richter  J 1977 J  and 
Cristiansen  and  Hougaard  |l978j. 

iv)  Methods  using  Fourier  series  expansions. 

A  few  papers  construct  the  solution  of  the  first  biharmonic-  pro¬ 
blem  in  a  rectangular  region  using  infinite  Fourier  expansions. 
References  to  work  in  this  direction  include  Aronszajn,  Brown  and 
Butcher  [  1 973*1 .  Vaughan  ^1974j  and  Rahman  and  Usmani  [l977j  . 


-- -» -  * 


v)  Methods  using  mathematical  programming. 

Linear  programming  techniques  have  been  used  by  Cannon  and  Cecchi 
[l966j,  [ 1 967j  and  Dessi  and  Manca  ^1976j  while  Distefano  |l97lj 
reports  on  the  use  of  a  continuous  dynamic  programming  technique. 
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CHAPTER  III 

AN  0(N2)  METHOD  FOR  THE  SOLUTION  OF  THE 
FIRST  BIHARMONIC  PROBLEM 

Consider  the  Oirichlet  problem  for  the  bi harmonic  operator  in  a 
rectangle  R. 

A2u(x,y)  =  f(x,y)  (x,y)  e  R 

u(x,y)  =  g(x,y)  (x,y)  e  3R  (3.1) 

un(x,y)  =  h(x,y)  (x,y)  e  3R 

Here  up  denotes  the  normal  derivative  of  u  with  respect  to  the  ex¬ 

terior  normal. 

A  new  and  more  efficient  solution  technique  will  be  described  for  the 
case  whenthe  above  system  is  discretized  using  the  standard  13-point 
stencil  combined  with  quadratic  extrapolation  at  the  boundary.  It  will 
be  shown  that  by  using  this  method,  the  solution  of  the  discrete  problem 
on  an  N  by  N  grid  can  be  computed  in  0(N  )  arithmetic  operations. 

This  is  an  order  of  magnitude  faster  than  earlier  methods.  In  addition, 
the  storage  requirement  is  also  significantly  reduced  compared  to  pre¬ 
viously  published  algorithms. 

The  theory  in  this  chapter  does  not  uniquely  define  a  numerical  al¬ 
gorithm.  In  fact,  it  will  become  clear  that  there  are  several  ways  of 
2 

implementing  0(N  logN)  methods  as  well  as  an  even  faster  direct 
2  3 

0(N  logN)  method  requiring  0(N  )  operations  in  a  preprocessing  stage. 

It  should  be  pointed  out  that  the  logN  term  only  arises  when  doing  a 
fast  Fourier  transform  that  can  be  associated  with  solving  Poisson's 
equation  on  the  given  grid.  Several  methods  for  solving  Poisson's 
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2 

equation  in  only  0(N  )  operations  are  known.  Work  in  this  direction 

has  been  reported  by  Banks  ^197s]  and  Detyna  ^1979],  while  Swarztrauber 

[l977]  gives  an  0(N21oglogN)  method. 

It  is  possible  to  use  one  of  these  methods  as  a  subprogram  in  the 

algorithms  described  in  this  chapter,  and  this  would  result  in  a  fast  bi- 

2 

harmonic  solver  requiring  only  0(N  )  arithmetic  operations  to  achieve 
a  prescribed  accuracy. 

There  are  at  least  four  reasons  for  keeping  the  discrete  Fourier 
transform  (and  therefore,  the  logN  term)  in  this  description  of  the  new 
method . 

i)  The  theory  becomes  clearer  and  more  coherent, 

ii)  The  0(N  )  methods  for  Poisson's  equation  are  still  research 
codes  of  limited  availability  and  several  have  problems  with 
numerical  instabilities. 

iii)  The  fast  Fourier  transform  is  a  more  widely  used  computa¬ 
tional  tool.  Very  efficient  codes  already  exist  and  hard¬ 
ware  implementations  are  likely  to  exist  on  many  computer 

2 

systems  in  the  future.  The  constant  in  front  of  the  N  logN 

term  is  also  quite  small  compared  to  the  constant  in  front 
2 

of  the  N  term.  Under  these  circumstances  the  logN 
penalty  may  be  of  little  significance  in  actual  computation, 

iv)  The  fast  Fourier  transform  is  used  anyway  in  a  different  part 
of  the  algorithm.  (It  only  makes  an  O(NlogN)  contribution 
to  the  operation  count  in  this  part,  so  a  slow  Fourier  trans¬ 
form  would  not  change  the  asymptotic  efficiency). 


A  more  detailed  analysis  of  several  variants  of  the  algorithm  with 
precise  descriptions  of  actual  computer  implementations  including  storage 


requirements  and  operation  counts  is  given  in  Chapter  IV. 

In  this  section  the  structure  and  properties  of  the  discrete  matrix 
problem  corresponding  to  (3.1)  will  be  analyzed.  Since  the  basic  method 
of  solution  is  closely  related  to  this  structure,  the  analysis  will  be 
carried  out  as  a  constructive  derivation  of  the  algorithm. 

Assume  that  the  rectangle  R  is  discretized  using  a  grid  with  M 
uniformly  spaced  interior  gridpoints  in  the  x-direction  and  similarly  N 
points  in  the  y-direction.  The  resulting  linear  system  of  MN  equations 
is 

=  b 

with  (uh)-jj*  i  *  1,2, ...M,  j  =  1,2, ...N  denoting  the  discrete  approxi¬ 
mation  to  the  continuous  solution  u(x,y)  at  the  coordinate  (iAx,  jAy). 
The  vector  b  is  given  by 

bjj  =  (Ay)4  f ( i Ax ,  jAy)  +  Z.j 

where  the  sparse  vector  £  is  a  linear  combination  of  the  boundary  data 
corresponding  to  the  quadratic  boundary  approximation  discussed  in  Chap¬ 
ter  II. 

In  order  to  discuss  the  efficient  numerical  solution  of  this  system 
some  notation  is  needed. 

Let 

6  =  Ay/ Ax 

and  define  the  two  matrices 


~2  -1  1 
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' 1 

- 

-1  2  . 

0 

o 

•  «  • 

•  •  • 

T„  = 

• 

• 

•  •  • 

.  2  -1 

N 

0 

• 

0 
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N*N 

1_ 

Let  I.,  denote  an  N  x  N  identity  matrix.  The  matrix  A  can  be  writ- 
N 

ten  as 

A  =  [s2(In®Rm)  +  (RN®IM)]2  +  2(Tn®Im)  +  264(IN(g>TM)  (3.2) 

Standard  tensor  product  notation  is  used,  i.e. 


C  =  (dn®  em) 


denotes  the  block  NM  x  NM  matrix  (with  blocksize  M) 


C  = 


d^E  .  .  .  dj^E 

.  d22E 


(_dNlE  *  *  '  dNNE 


NMxNM 


Note  that  the  matrix 


L  -  «2(IN®RM)  +  (Rn®im) 


(3.3) 


is  nothing  but  the  matrix  that  results  when  solving  Poissons  equation  on 
the  same  grid  using  the  standard  5-point  difference  approximation  to  the 
Laplace  operator. 
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Cons  ider  the  N  *  N  symmetric  matrix 


(3.4) 


It  is  easy  to  show  that  the  vectors  i  *  1,2. ..N  are  the  normalized 
eigenvectors  of  and  that 


qnrnqn  =  an 

QN  =  QN  =  QHl  (3.5) 

An  =  diag(Aj ) 

Aj  =  2 (1-cos  jj^-)  j  =  1.2...N 

Notice  that  the  operation  of  computing  y  =  Q^x  for  a  given  vector  x 
of  length  N  is  just  a  real  sine-transform  of  x.  It  can  therefore  be 
carried  out  in  O(NlogN)  arithmetic  operations  using  the  fast  Fourier 
transform.  For  this  discussion  the  MN  x  MN  permutation  matrix 
P,  PTP  =  I  defined  by  the  relation 

p(0N®E„)pT  ■  (Em®Dn) 


is  also  needed.  If  P  acts  on  the  vector  uh  it  will  reorder  the  un 
knowns  by  columns  (vertically),  instead  of  by  rows  (horizontally).  It 
is  clear  that  this  involves  no  arithmetic  operations. 

The  matrix  is  of  rank  2  and  it  can  be  written 

TN  *  UNUI 

where 
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Def ine 


and 


Q  -  (1„®QH) 


kn  ’  Vn 


Now  consider 


/V  /\ 


P  Q  A  Q  PT  =  64(A^0In)  +  (Im®r2)  +  2  62(AM<g>RN) 

+  2(Im®unuJ)  +  2  64(KmkJ®In) 

=  S  +  2  64(KmkJ®In)  . 


(3.6) 


This  defines  the  block  diagonal  matrix 


O 


s  * 


o 


M 


NMxNM 


Explicitly  written  out,  block  number  k  of  S  has  the  following  penta- 
diagonal  structure: 


7+4S2\k+64X2,  -4-262Xk  ,  1 


Sk  = 


-4-262X, 


,  6+462Xk+64X2, 


1 

6+462Xk+64X2,  -4-262Xk 
1  -4-2S2Xk  ,  7+462Xk+64x2 


(3.7) 


N*N 


After  these  transformations  the  problem  is  reduced  to 


[S  +  264(KMKj<2)IN)]v  =  c  (3.8) 

A  A 

where  the  transformed  variables  v  =  PQuh  and  c  =  PQb  have  been  intro¬ 
duced.  It  is  important  to  notice  that  the  matrix  (<£)IN)  has  rank 

2N  only. 

The  following  generalization  of  the  Sherman-Morrison  formula  is  well 
known  (Dahlquist  and  Bjorck,  [l974]). 

Let  E  e  Rnxn  be  nonsingular,  V  e  Rnx*D,  and  W  e  Rnx^.  Then 

(E  +  VWT)_1  =  E-1  -  E"1V(I  +  WTE-1V)-1WTE'1  (3.9) 

provided  that  the  pxp  matrix  (I  +  W^E"^V)  is  nonsingular. 

Applying  this  formula  to  equation  (3.8)  makes  it  possible  to  write 
down  an  explicit  expression  for  the  solution  uh  (returning  to  the  ori¬ 
ginal  variables). 

uh  =  <IN®QM)pT  [l-264S’1(KM®IN)B-1(Kj0IN)]s-1P(IN®QM)b  (3.10) 

where  B  is  the  2N  x  2N  matrix 

B  =  I  +  2<54(kJ,<2)In)  S'V^)  .  (3.11) 

By  looking  at  the  different  matrices  in  (3.10),  performing  the  op¬ 
erations  from  right  to  left  it  is  clear  that: 

i)  (In®Qm  )  requires  O(NMlogM)  (N  fast  Fourier  transforms 

operations.  of  length  M). 

ii)  P  requires  no  operations.  (A  permutation  only). 

/'_i 

iii)  S  requires  0(NM)  opera-  (M  pentadiagonal  systems  of 
tions.  size  N). 
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iv)  (kJ(^)In)  requires  O(NM)  ^  has  sparse  simple 

operations.  structure). 

v)  B'1  requires  ?  operations.  (B  has  not  been  analyzed  yet). 
In  this  way,  the  design  of  an  efficient  method  has  been  reduced  to 
the  fast  solution  of  a  linear  system  with  coefficient  matrix  B.  In  the 
following  a  careful  study  of  the  matrix  B  is  made  and  a  method  of  solving 
such  linear  systems  in  no  more  than  0(NM)  arithmetic  operations  is  ob¬ 
tained. 

In  order  to  do  this,  taking  advantage  of  the  structure  in  B,  the 
matrix  can  be  written: 

B  =  I  +  264(KTm<S>In)  S_1  (Km®In) 


=  I+26 


ql  1 1 N  •”  qMl!N 


qlMXN  **•  qMM*N 


ql 1 1 N  qlM*N 


SM  J  LWn  qMMIN 


=  I  +26 


M  2  -1  M  -1 

c=l  q,dSk  ’  k=l  qklC,kMSk 


M  -1  M  2  -1 
qklqkMSk  ’  qkMSk 


2Nx2N 


■\^T 


qkl  "  VM+1  5  n  M+T 

_  «.  /  i  \  k+1  _ 

qkM  '  qkl 


(3.12) 
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Def ine 


odd 


even 


Therefore 


B  »  I  + 


45 

M+I 


M 

.  2 

kir  C-1 

Z 

k=l,3,5. 

sin 

fPTsk 

M 

.  2 

kir  c-1 

Z 

k=2,4,6. 

sin 

M+r  Sk  • 

:  . .  +  S 
odd  even 

^odd  "  ^even 

-  s 

‘odd  even 

^odd  +  ^even 

u 

Lr  system 

Bx  = 

d.  Partition 

(3.13) 

(3.14) 


(3.15) 


2Nx2N 


2  2 

into  subvectors  of  length  N  consistent  with  the  partitioning  of  B. 

By  adding  and  subtracting  equations  this  system  splits  into  two  linear  sys¬ 
tems  each  of  size  N  x  N: 

(I  +  -8-L..  $ 

\  L  Mai  -> 


W  (xl  +  x25  =  dl  +  d2 

(3.16) 

’even)(xl  "  x2*  =  dl  '  d2  * 

(3.17) 

It  will  later  be  shown  that  these  problems  can  be  split  further  into 
four  symmetric  positive  definite  matrix  problems  each  of  size  N/2.  (N/2 

will  be  used  to  denote  both  (N+l)/2  and  (N-l)/2  if  N  is  odd,  the 
actual  value  being  clear  from  context.)  However,  since  all  known  practi¬ 
cal  direct  methods  for  solving  a  general  dense  linear  system  of  equations 
of  order  N  require  0(N  )  arithmetic  operations,  it  is  natural  to  study 
possible  iterative  methods.  (There  exist  certain  direct  methods  for  very 
special  classes  of  matrices,  for  example  Toeplitz  matrices,  with  a  lower 
operation  count,  but  the  matrices  under  consideration  do  not  seem  to  belong 
to  any  such  class).  Notice  also  that  the  matrices  SQdd  and  Sgven  are 
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2 

defined  by  rather  complicated  relations.  In  fact,  it  would  require  0 ( N  M) 
arithmetic  operations  to  generate  all  the  explicit  matrix  elements.  Since 
all  the  pentadiagonal  blocks  of  S  are  symmetric  and  positive  defi¬ 

nite,  it  follows  that  both  Sodd  and  Seven  have  the  same  properties. 

A  very  attractive  iterative  scheme  for  the  solution  of  a  symmetric 
positive  definite  linear  system 

Ax  =  b 

is  the  conjugate  gradient  method.  From  an  arbitrary  initial  vector  x^ 
the  method  generates  a  sequence  of  approximations  (xn)  to  the  solution 
x  defined  by 


xn+l  xn  +  anpn  *  an  (Ap'n!p'n) 


pn+l  "  rn+l  +  ^npn 


(rn+l,rn+l' 

"  TvrJ- 


(3.18) 


where 


rn  ’  b  -  Axn  an<l  p0  ’  r0 


The  method  is  due  to  Hestenes  and  Stiefel  ^1952^.  A  good  description  of 
the  method  and  some  of  its  properties  can  be  found  in  Luenberger  ^1973j  . 
The  iteration  does  not  require  knowledge  of  the  matrix  elements,  since 
only  matrix  vector  products  are  needed.  It  is  clear  from  the  structure  of 
S  j,  and  S  (3.13,  3.14)  that  a  matrix  vector  product  can  be  corn- 
puted  by  solving  M/2  of  the  pentadiagonal  systems  Sk-  The  cost  of  a 
matrix  vector  product  is  therefore  0(NM)  arithmetic  operations.  The 
number  of  iterations  required  to  achieve  a  given  accuracy  when  solving  a 
symmetric  positive  definite  system  of  linear  equations  Ax  =  b  using  con¬ 


jugate  gradients,  is  in  general  proportional  to  (u„, 

max  in  1  n 


where 
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N 

is  the  spectrum  of  A.  It  should  be  pointed  out  that  special  dis¬ 
tributions,  in  particular  clusters  of  eigenvalues,  will  lead  to  a  consi¬ 
derably  faster  rate  of  convergence.  (Kaniel  [l966j,  Stewart  ^1975],  Cline 
[l976 j ,  Jennings  ^1977^  ,  and  Greenbaum  [l979].) 

It  can  be  shown  using  the  results  of  Appendix  II  that  the  largest 
eigenvalues  of  the  matrices  given  in  (3.16,  3.17)  both  are  proportional  to 

M.  A  direct  application  of  conjugate  gradients  to  those  linear  systems 

3/2 

will  therefore  require  at  most  0(NM  )  arithmetic  operations.  This  is 
of  the  same  order  of  magnitude  as  the  method  known  in  the  literature  as 
"The  coupled  equation  approach"  described  and  studied  by  Smith  ^1968,  1970, 
1973],  Ehrlich  [l971,  1972,  1973],  Greenspan  and  Schultz  [l972],  McLaurin 
[l974]  and  Gupta  [l975]. 

The  iterative  techniques  proposed  in  these  papers  all  require  vari¬ 
ous  acceleration  parameters  to  be  estimated.  In  addition,  each  iteration 
amounts  to  solving  two  full  Poisson  problems.  The  actual  use  of  these 
methods  have  been  restricted  to  rectangular  regions  since  the  handling  of 
the  boundary  and  the  need  to  compute  normal  derivatives  there,  is  quite 
complicated  for  more  general  domains. 

The  use  of  a  conjugate  gradient  iteration  has  several  advantages 
over  the  iterative  methods  proposed  earlier.  The  method  requires  no 
estimation  of  iteration  parameters  and  it  takes  advantage  of  the  spectral 
distribution  of  the  linear  operator  in  an  optimal  way.  Thus  a  more  care¬ 
ful  study  of  the  spectrum  (see  Appendix  II)  reveals  that  it  clusters 

around  1  and  that  the  large  eigenvalues  behave  like  cM/i,  i  =  1,2,..  . 

1/3 

It  can  be  shown  that  the  conjugate  gradient  method  converges  in  0(M  ) 

iterations  if  the  arithmetic  is  exact.  Unfortunately  inexact  arithmetic 

1/2 

makes  the  actual  number  of  iterations  (using  3.18)  behave  more  like  0(M  ). 


-•'■W  <&*> 


-47- 


The  use  of  a  few  quasi-Newton  updates  (see  Chapter  IV)  or  selective  ortho¬ 
gonal  ization  (Parlett  p98oj)  as  part  of  the  conjugate  gradient  procedure, 
provides  a  remedy  for  this  problem. 

4/3 

With  an  operation  count  of  0(NM  ),  this  algorithm  is  faster  than 
those  mentioned  above.  In  practice,  even  with  a  standard  conjugate  gra¬ 
dient  implementation,  the  method  is  substantially  faster  than  previous 
algorithms. 

The  main  purpose  of  this  chapter  is  however,  to  show  that  linear 
systems  defined  by  the  matrix  B  (3.11)  can  be  solved  using  only  0(NM) 
arithmetic  operations. 

Suppose,  instead  of  applying  the  conjugate  gradient  method  directly 
to  a  linear  system  Tx  =  b,  that  it  is  possible  to  split  T  such  that 

T  =  T  -  R 

where  T  is  symmetric  positive  definite.  Assume  in  addition  that  it  is 
easy  to  solve  linear  systems  with  the  matrix  T.  In  such  a  case,  the  con¬ 
jugate  gradient  method  can  be  used  with  a  preconditioning  matrix  T  cor¬ 
responding  to  the  above  splitting  of  T.  This  can  equivalently  be  viewed 
as  applying  ordinary  conjugate  gradient  iteration  to  the  transformed  system 


but  working  with  the  original  variables  x  =  T-^  y  and  b  =  T*5  c.  The 

number  of  iterations  needed  in  order  to  achieve  a  given  accuracy  is  there 

fore  in  general  again  proportional  to  the  ratio  ^max  AJmin) 2,  but 
N 

{y i } i_l  are  now  the  eigenvalues  of  the  matrix 

K  =  T'1  T  =  I  -  T"1  R  . 


jlu.: 
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(K  is  of  course  similar  to  the  symmetric  matrix  T  T  T  2  given  above). 

A  good  analysis  of  this  technique,  including  numerical  algorithms, 
is  given  in  Concus,  Golub  and  O'Leary  ^1976^ .  If  is  an  approxi¬ 

mate  inverse  of  T  then  the  convergence  rate  will  be  much  improved.  Two 
different  effects  can  contribute  in  this  process. 

i)  The  ratio  Mrnax/^^^ p  -js  often  substantially  reduced  when 
considering  K  instead  of  the  original  matrix  T. 
ii)  Equally  important  is  the  fact  that  K  often  will  have 
clusters  of  eigenvalues.  Typically,  K  will  have  only 
p  «  N  eigenvalues  appreciably  different  from  1.  The 
number  of  iterations  required  for  convergence  will  then 
be  similar  to  the  number  required  for  a  problem  of  di¬ 
mension  p  with  the  corresponding  spectrum. 

The  next  few  pages  will  describe  how  to  find  a  splitting  of  the  present 
problem  (3.16,  3.17)  that  has  both  of  the  above  properties. 

Write 


sk  -  Sk  +  2UNUN 

=  Sk  +  +  2eNeN 


(3.20) 


Comparing  with  (3.5)  and  (3,7)  it  is  clear  that 

~sk  "  («Vn  +  rn>2  (3.21) 

and  therefore  all  the  matrices  ,  k  =  1,2,...M  have  the  same  set  of 
eigenvectors  represented  by  the  matrix  QN  (3.4). 
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QNSkQN  =  Vk 

\  =  diag^j)  (3.22) 

¥kj  =  (<s2W2  ’  j  =  1*2»--*N 


(Recall  that  A^  is  defined  in  (3.5)  and  that  this  definition  depends 
implicitly  on  the  range  k  is  running  over.) 

In  the  following,  let 

Ti  =  (I  +  WY  .  =A+?  Sir'2fesk1)  1  =  l>2  (3.23) 

K”  1  )  •  • 


represent  the  matrix  in  both  linear  systems  (3.16)  and  (3.17).  The  nota- 
M 

tion  E  indicates  that  the  summation  extends  over  odd  or  even  k 
k=i ,i+2 

(depending  on  i)  up  to  M.  Let  T..  represent  the  matrix 


(I  +  m+t 


k=i,i+2,.. 


wr  K1' 


i  =  1,2 


(3.24) 


where  S^1  has  replaced  S^1  in  (3.23).  and  T.  can  both  be  viewed 
as  discrete  approximations  to  certain  boundary  integral  operators  relating 


the  solution  of  (3.1)  to  the  solution  of  the  separable  problem  where  au 


is  specified  on  two  opposite  parts  of  the  boundary  instead  of  un>  In  par¬ 
ticular,  Ti  corresponds  to  a  separable  operator.  There  is  a  close  cor¬ 
respondence  between  these  operators  (in  the  rectangular  case)  and  the  in¬ 
tegral  operator  A  defined  in  Glowinski-Pironneau  [l979j. 

When  using  conjugate  gradients  to  solve  a  linear  system  involving 


the  matrix  T..,  consider  a  preconditioning  corresponding  to  the  following 
spl itting: 


Ti  -  Ti  -  (Ti  -  V  • 


(3.25) 
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Observe  that 


Ti 


"  V1 


854 

f¥?T 


M 

I  sin*1 
k=i , i+2. . 


krr 

WT 


(3.26) 


=  Q 


NDiQN 


i  »  1,2 


The  matrix  defined  in  (3.26)  is  diagonal  and  can  be  computed  in 
0(NM)  operations.  The  solution  of  linear  systems  involving  Ti  can 
therefore  be  performed  in  O(NlogN)  operations  once  the  matrix  0i  has 
been  computed  and  stored. 

Lemma  3.1 

The  matrices  T^,  T^  and  Ti  -  Ti  are  symmetric,  Ti  and  T^ 
positive  definite  and  Ti  -  Ti  positive  semi-definite. 

Proof : 


The  statement  about  T.  and  T^  follows  trivially  from  the  defi 
nitions  (3.5),  (3.7),  (3.23)  and  (3.24). 

Consider  the  matrix  T.  -  Tj 


.  T  =  X  7 

1  ^k-M*. 


si"2  fer 


(S 


-1 


-  S'1) 

\  1 


But,  using  (3.20)  and  the  Sherman-Morrison  formula  results  in 
S'1  -  S-’d^  UN(I2  ♦  2uJs;1Un)-1  uj  S-1)  . 

Thus 

K1  -  Sk1  *  2  K'  V!Z  *  • 

This  matrix  is  clearly  positive  semi-definite.  (~~) 

Lemma  3.2 

~-l  N 

Let  =  T.  T.  and  let  be  the  spectrum  of  K . ,  then 


o  <  Uik  i  1  ,  1  i  k  <  N  ,  i  =  1,2  . 

Proof: 

It  follows  from  Lemma  3.1  and  the  fact  that  T\*  T-  is  similar  to 
T,  that  M1k  >  0  . 

Let  x  be  any  vector,  xTx  =  1.  Using  Lemma  3.1  it  follows  that 
xT  T -  Ti)TT,s  x  >  0 


which  implies 


xT  TT*4  T,  x  <  1 


p,k<  1,  k  »  1,2.. .N  1-1,2  .  □ 


In  order  to  prove  that  the  preconditioned  conjugate  gradient  method 
proposed  above  converges  at  a  rate  independent  of  N,  a  theorem  giving 
more  precise  knowledge  than  Lemma  3.2  about  the  spectrum  of  Ki  is  needed. 

The  next  theorem  describing  a  matrix  decomposition  of  leads  to 
new  variants  of  the  algor ithm  as  well  as  better  knowledge  about  the  eigen- 


va,ues  {“ik!k»i 
Theorem  3.1 


For  i  e  {1,2,3,...}  ,  define  i  =  2i  -  <5.  ,  6..  =< 

r  rl  1J  10  i/j 


1  i=j 


0^  =  86  /(M+l)  and  ^  =  8/(N+l) 


^S  =  lt6sj.r!ra2,..5in2^IT’M  ' 


Let  PN  be  the  permutation  matrix  that  permutes  a  vector  x  e  R  odd- 
even,  i.e.,  if  x  has  components  (x^.x^.x^, . • -xN) ,  then  Px  has 


components  (x^,  x^.-.-x^,  x2»  x4»"*xfj)-  Then 


PN  %  Ti  In  p3  ■  r« 


(C11)1  c11 


o 


o 


(Ci2)T  ci2 


1  =  1,2 


NxN 


where  Crs  is  the  M/2  by  N/2  matrix  with  components 

r  =  1,2 

rs  _VBNgM  S1f1  M+T  S1n  jT+T  S  =  1,2 

Va*N  <x™  V,  ,  i  =  1,2,... M/2,  if  M  odd 


V  JS  VJs 


j  =  1,2, ...N/2,  Mill  if  n  odd 


Proof : 


See  Appendix  I.  d] 


First  observe  that  the  two  matrix  problems  associated  with  T^(i*l,2) 
have  been  split  into  a  total  of  four  smaller  problems.  This  reduces  the 
required  computer  storage,  but  this  fact  is  even  more  important  in  the  de¬ 
sign  of  a  direct  method.  The  reduction  into  four  subproblems  is  a  direct 
consequence  of  the  symmetry  of  the  biharmonic  operator  on  the  rectangle  R. 
A  second  important  observation  is  that  the  elements  c^  can  be  easily 
computed  after  some  initial  computation  of  the  quantities  that  appear  in 
the  above  formula.  This  requires  only  0(NM)  operations  and  0(N)  +  0(M) 
storage  and  provides  an  alternative  to  the  implicit  definition  of  T^ 
given  in  (3.23). 

It  is  now  possible  to  prove  a  stronger  result  than  Lemma  3.2.  Let 
be  the  singular  values  of  Cij,  and  let  be  the  eigen¬ 

values  of  K.j  =  TT*  T.j.  Clearly,  from  Theorem  3.1,  there  is  a  one  to  one 
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correspondence  between  u  and  a  given  by 

u  =  1  -  a2  (3.27) 

where  the  subscripts  and  superscripts  have  been  dropped  for  notational 
convenience. 

First  consider  the  case  where  M  =  N  and  6=1.  In  this  case 

22  12  21  T 

and  C  are  square  and  symmetric,  while  C  =  (C  )  is  almost  square. 

This  case  is  slightly  simpler  to  analyze  and  will  be  considered  first. 

Theorem  3.2 

N/2 

Assume  N  =  M  and  6  =  1.  Let  be  the  singular  values  of 

rs 

one  of  the  matrices  C  defined  in  Theorem  3.1.  Then 

0  £  ai  <  0.8 

independent  of  N. 

Proof : 

See  Appendix  II.  Explicit  expressions  for  the  matrix  elements  cT? 

*  J 

are  derived  in  the  limiting  case  N  -*■  °°.  A  simple  Gershgorin  estimate  can 
be  applied  in  this  case  to  give  an  upper  bound  for  the  largest  singular 
value  a.j.  The  largest  singular  value  ai  and  the  actual  computed  upper 
bound  are  shown  in  Figure  3.1  for  N  ranging  from  1  to  2047. 

Computations  show  that  om„„  always  belong  to  C^.  A  block  Lanzcos 

uiaX 

code  written  by  Underwood  [l975j  was  used  to  compute  the  eigenvalues  in 
Figure  3.i.  The  theoretical  Gershgorin  bound  when  N  tends  to  infinity 
is  also  indicated.  Although  sufficient  for  this  theory,  the  figure  indi¬ 
cates  that  the  bounds  are  not  very  sharp  for  realistic  values  of  N. 


L 
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1  2  3  U  5  6789  10  11  LoggCN+l) 


Figure  3.1.  The  largest  singular  value  as  a  function 
of  log2(N+l)  (below),  compared  with  the 
corresponding  Gershgorin  bound  (above). 

Theorem  3.2  shows  that  0  £  ai  <  0.8,  the  next  theorem  implies  that  the 
singular  values  cluster  at  zero. 

Theorem  3.3 

if  a.j  belong  to  or  C22 
if  belong  to  C*2  or  C21  . 

Proof: 


N/2 

l  CTi  <  In  N 

1-1 

N/2  - 

Z  at  <  In  N 
i-1  1 


See  Appendix  II.  [J 


-55- 


12  21 

Remark:  The  weaker  statement  when  ck  belong  to  C  or  C  is  stated 
because  the  proof  technique  becomes  simpler.  See  the  comments  in  Appendix 
II.  Notice  that  Theorem  3.3  implies  that  only  O(logN)  of  the  ck's  are 
outside  any  given  neighborhood  of  zero.  With  this  information  about  the 
{ a i } i  1. 1  it  is  possible  to  give  some  estimates  of  the  rate  of  convergence 
of  the  proposed  conjugate  gradient  iteration. 

Theorem  3.4 

If  the  conjugate  gradient  algorithm  is  used  to  solve  the  linear  sys¬ 
tem  T-x  =  b  with  the  splitting  T^  =  T^  -  (T^  -  T^),  then  the  initial 
error  will  be  reduced  by  a  factor  e  after  at  most 

k  -  ln(|) 


iterations. 

Proof: 

Let  Uj  <  u  <  ...  <  uN>  It  is  well  known  from  the  standard  theory 
of  the  conjugate  gradient  method  that 


£  I  y\  <;rSr) 


where  Tk  is  the  k'th  Chebychev  polynomial  of  the  first  kind. 
Tk(x)  3  cosh(k  cosh"*x)  for  x  >  1.  Therefore 


k  <  cosh^d/e) 
-rn,h-ifV% 
h  Vui 


Using  cosh“^(i)  <  ln(|o  ,  >  1  -  ,82  =  .36  ,  and 

cosh”1('I^rl|)  >  *  9ives  the  desired  result.  Q 
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This  theorem  establishes  convergence  to  any  prescribed  accuracy  in 

a  constant  number  of  iterations  independent  of  N.  Since  each  iteration 

2  2 
takes  0(N  )  arithmetic  operations  the  description  of  an  0(N  logN)  al¬ 
gorithm  for  the  first  biharmonic  problem  is  complete.  If  the  accuracy  is 
required  to  increase  with  increasing  N  as  N‘p  for  a  fixed  p,  then 
O(logN)  iterations  are  required  and  the  overall  asymptotic  operation  count 
remains  unchanged.  (In  order  to  be  consistent  with  a  decreasing  trunca¬ 
tion  error  p  =  2). 

p 

However,  under  the  above  assumptions  the  use  of  an  0(N  )  Poisson 
solver  will  not  make  the  overall  algorithm  any  faster  if  the  solution  on 
the  final  grid  is  computed  directly.  In  order  to  have  an  0(N )  method 
under  these  assumptions,  it  is  necessary  to  compute  the  solution  on  a 
sequence  of  grids,  reducing  the  error  by  a  fixed  amount  on  each  grid. 

(The  total  work  on  all  the  coarser  grids  will  only  be  0(N^).) 

For  practical  computations  (N  <_  2047)  the  use  of  the  computed  spec¬ 
tral  radius  *  .6343  for  N  =  2047  (See  Figure  3.1)  strengthens  the 
above  theorem  to 

k  <  %  In  (§)  . 

As  an  illustration  taking  e  =  10’*®,  this  estimate  gives  k  £  12. 

The  above  theorems  show  that  the  conjugate  gradient  iteration  con¬ 
verges  at  a  very  fast  linear  rate.  The  next  theorem  complements  this  by 
showing  that  asymptotically  the  rate  of  convergence  is  in  fact  superl inear. 

A  sequence  converges  R-superl inear ly  to  zero  if  and  only 

if  lim  sup  ||e.||1/k  =  0.  An  excellent  reference  discussing  the  conver- 

k-«x>  K 

gence  of  iterative  processes  is  Ortega  and  Reinboldt  [ 1970 1 . 


Theorem  3.5 

The  conjugate  gradient  method  defined  in  Theorem  3.4  has  a  R-super 
linear  rate  of  convergence. 

Proof: 

Using  the  optimality  property  of  the  conjugate  gradient  iteration 

IM  H  (ck)k  i  7ax,N  "  i-j-i  IM 

M£{ui}i=1  J=1  MJ 

where  ||ek||  is  the  error  in  the  appropriate  norm  at  iteration  k.  Let 
N 

the  set  {p.},.  -  be  ordered  such  that  \}.  £  ui+i  V  Then 


|ek!l  I  max  N 


k  u,--u 


A  '  1  '"o' 


k  aW 


<  max 


n  -Sr  lie 


oe{ai}i=k+l  j=1  l-°3 


2  i^O1 


k  o? 


i  "  N^o'l  • 


j-1  1-Oj 


Using  the  arithmetic-geometric  mean  inequality.  Theorem  3.3  and  the  fact 
that  a.  <  1  yj  gives 


< 


1  k  °i 
£  E  2 
j=1  1’aj 


IN0I 


1  In  N 

k 


l-o 


1 


Ifer 


This  inequality  shows  that  the  constant 
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ln  N 


tends  to  zero  as  k  increases  for  fixed  N. 


However,  since  the  concept  of  R-superl inear  convergence  is  most  mean¬ 
ingful  in  the  case  of  an  infinite  number  of  iterations  and  the  conjugate 
gradient  method  has  finite  termination  on  finite  dimensional  problems,  con¬ 
sider  the  limiting  case  as  N  -*•  <=°  .  Theorem  3.3  implies  that 


1 im  a  .  =  0 

k-w=o  K 


and  therefore 


Finally,  consider  the  case  where  N  /  M.  Without  loss  of  generality 
assume  6  =  (M+1)/(N+1).  (6  was  defined  to  be  ay/ax).  This  restriction 

corresponds  to  a  linear  scaling  of  one  of  the  independent  variables  in  the 
differential  equation.  Let  denote  the  M/2  by  N/2  matrix  Crs 

derived  from  an  MxN  grid.  The  following  relations  hold: 


(3.28) 


as  can  be  seen  from  the  definition  of  Crs  in  Theorem  3.1. 

Using  the  same  technique  as  in  Appendix  II,  this  time  working  with 
the  singular  values  of  all  four  matrices,  it  can  be  shown  that  the  largest 
singular  value  is  smaller  than  what  it  is  in  the  case  of  a  square  grid  with 


1 


max(N,M)  gridpoints  in  each  coordinate  direction.  (An  alternative  approach 
is  to  show  that  each  element  c^  decreases  if  M  or  N  is  reduced.  The 
spectral  radius  of  a  non-negative  matrix  is  at  least  as  large  as  that  of 
a  principal  minor,  and  it  increases  if  an  element  of  the  matrix  increases. 
This,  together  with  the  claim  about  the  element  c  ^  above,  leads  to  the 

’  s J 

desired  conclusion.)  (3.28)  shows  that  M  and  N  enter  the  problem  in  a 
completely  symmetric  way  and  it  is  sufficient  to  consider  the  case  N  £  M. 
(This  choice  saves  both  storage  and  arithmetic  operations  in  the  conjugate 
gradient  iteration.)  The  largest  singular  value  will  again  belong  to  the 
matrix  C^.  Figure  3.2  shows  the  computed  value  of  omax  for  various 
values  of  N  £  M.  The  corresponding  Gershgorin  bounds  obtained  using 

'  /  n ^rs  n  i. crs  ||  are  Sf,own  -jn  same  f-jgure  f0r  N  >  M  (the 


o  <  (  ||C 
max  -  v  11 


case  N  =  M  is  in  Figure  3.1).  Finally,  in  Figure  3.3,  the  largest  sin¬ 
gular  value  in  each  of  the  four  different  matrices  Crs  are  computed  for 
a  few  values  of  M  and  N. 

It  is  felt  that  the  computed  spectral  data  combined  with  the  theory 
in  this  chapter  provide  a  good  foundation  for  using  the  proposed  conjugate 
gradient  iteration  in  practical  computer  codes  for  the  biharmonic  equation. 

\  GERSHGORIN  BOUND 
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Figure  3.2.  Max  singular  value  and  Gershgorin  bound. 
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CHAPTER  IV 
COMPUTER  ALGORITHMS 


This  chapter  will  describe  a  few  computer  algorithms  implementing 
the  ideas  in  the  previous  chapter.  The  more  general  equation 

2 

A  u(x,y)  +  a  A  u(x,y)  +  8  u(x,y)  =  f(x,y)  (x,y)eR 

u(x,y)  =  g(x,y)  (x,y)e9R  (4.1) 

un(x,y)  =  h(x,y)  (x,y)e3R 

will  be  considered.  Efficient  methods  for  solving  a  sequence  of  such  pro¬ 
blems,  as  well  as  the  performance  of  the  proposed  algorithms  on  vector 
and  parallel  computers  will  be  described.  Numerical  results  showing  the 
stability  of  the  numerical  process  with  respect  to  roundoff  errors  can  be 
found  in  section  4.6.  Section  4.7  discusses  how  to  solve  the  discrete 
approximation  when  the  cubic  extrapolation  defined  in  (2.4),  is  used  near 
the  boundary.  An  efficient  numerical  method  for  the  solution  of  the  first 
biharmonic  boundary  value  problem  in  a  circular  disk  is  given  in  section 
4.8,  while  section  4.9  indicates  how  problems  in  different  geometries  may 
be  handled  using  conformal  mapping. 

4.1  An  algorithm  using  Fourier  transform  and  penta-diagonal  linear  systems. 

All  the  algorithms  to  be  presented  here  are  based  on  the  theory 
developed  in  Chapter  III.  Quite  a  few  arithmetic  operations  as  well  as 
storage  locations,  can  be  saved  by  paying  close  attention  to  the  way 
various  expressions  are  related.  Although  these  aspects  are  important  in 
order  to  produce  an  efficient  code,  some  details  are  omitted  in  this  pre¬ 
sentation.  The  algorithms  are  stated  in  a  form  closely  corresponding  to 
actual  computer  programs.  It  is  convenient  (but  not  necessary)  to  assume 


that  both  N  and  M  are  odd. 

A  few  definitions  are  needed  before  the  algorithms  can  be  stated: 


a  =  ( Ay ;  a 
8  5  (Ay)4  8 


Uk  5  2  +  4  fi^in^jgj-) 


qN  =  V8(N+1)  Qp 


k  =  1,2, ...M  . 


Sk  =  Pentadiag  [l.a  -  2uk>2  +  8  +  uk(uk-a),a  -  2pk,l]  + 


The  notation  (aside  from  new  definitions)  is  consistent  with  the  notation 
introduced  in  the  beginning  of  Chapter  III.  The  unnormalized  transform 
is  used  in  order  to  conform  with  the  fast  Fourier  transform  package 
written  by  Swarztrauber  [l978 ]  and  used  in  the  computation  reported  in  this 
thesis. 

Let  the  vector  f..  (i  =  1,2,...M,  j  =  1,2,..N)  represent  the  dis- 

*  J 

Crete  right  hand  side  function  in  (4.1)  and  let  the  sparse  vector  . 

*  J 

contain  the  contribution  from  the  boundary  data  g  and  h  to  the  right 
hand  side  of  the  discrete  linear  system.  The  subvector  (f^,  fi2*,fiN^ 
will  be  written  f.#,  while  (^ij  >^2j  *  *  "fMj  ^  is  written  f#j..  Also 
let  x,y,z,s  and  p  represent  five  (work)  vectors  each  of  length  N. 


Algorithm  4.1 


For  j=l,2,..N  : 


f.j  :  *  (4*>  f.j  *  V> 

f.j  :  •  f.j  • 


x  :  =  0  . 


2. 
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For  i=l,3,5,...M  : 

^  *  si1  f1. 


x  :  =  x  +  (-2 - 5 

(M+l)‘ 

i»  :  8 ( M+l )  y 


sin 


4.  Solve  a  linear  system  of  equations  Cy  =  x  using  conjugate  gradients 
preconditioned  by  the  matrix  H. 

A.  A  matrix  multiply  p  :  =  Cs  is  defined  by: 


p  :  =  s 

For  i=l,3,5,...M  : 


2  :  =  V  s 


P  :  =  P+ ( M+l  Sin  M+l^  2 


B.  A  preconditioning  step  p  :  =  Hs  is  defined  by: 
P  :  =  Qns 


p  :  =  D  p 


P  :  =  QnP  • 


The  diagonal  matrix  0  has  (precomputed)  elements  defined  by: 


M 


d.  =  8(N+1)(1  +  Spr  t.  - 5 — r= - - r— r) 

J  i=l,3,..(4sin2  5  +  p.-2)(4sin  +  u1--2-a)+B 


-64- 


For  i=l,3,5,...M  : 


x  :  ■  S^y 


fi.  :  =  fi.  -  <s1n  i£r>* 


6.  Repeat  steps  2,3,4  and  5  with  i  running  over  even  integers 
instead  of  odd.  (i=2,4,6, . . .M  everywhere.) 


7.  For  j=l,2, . . .N  : 

f.j  :  =  f.j  • 

8.  Stop.  The  discrete  solution  of  equation  (4.1)  is  now  stored 
in  the  vector  f . .. 

Remark. 

Trigonometric  functions  needed  in  the  above  algorithm  should 
be  precomputed  and  saved  in  an  array  of  size  2(N+M).  The 
two  diagonal  matrices  D  used  in  4,  must  also  be  precomputed 
requiring  an  additional  2N  storage  locations.  Notice  that 
only  a  few  vectors  of  length  N  are  needed  in  step  4;  the 
big  vector  f . .  is  never  accessed. 

'  J 

The  conjugate  gradient  iteration  will  converge  in  a  small  number  of 
iterations  as  long  as  the  corresponding  linear  system  is  positive  definite. 
That  is  certainly  the  case  as  long  as  a  £  0  and  6^0.  If  the  system 
is  indefinite  a  routine  like  SYMMLQ  by  Paige  and  Saunders  ^1975^  can  be 
substituted.  (Alternatives  are  a  least  squares  formulation  or  the  al¬ 
gorithm  given  in  section  4.3.) 

The  performance  of  algorithm  4.1  depends  on  an  efficient  solution  of 
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the  pentad i agonal  linear  systems  S.y  =  x.  Consider  first  the  case  where 
Si  is  positive  definite.  Taking  advantage  of  the  special  structure,  the 
factorization  of  this  matrix  requires  3N  operations  (2N  multiplications 
and  N  divides  plus  3N  adds)  and  2N  words  of  storage.  (The  combina¬ 
tion  of  a  multipl ication/ addition  or  a  divide/addition  will  be  considered 
one  arithmetic  operation.)  The  solution  process  after  the  factorization 
has  been  completed,  takes  4N  operations.  (4N  mult/add  only.)  One  pos¬ 
sibility  is  to  save  all  the  factorizations  when  they  are  computed  the 
first  time.  This  would  apparently  require  0(N  )  storage.  However,  a 
much  better  alternative  is  to  observe  that  the  matrix  elements  in  the  fac¬ 
tored  form  of  S.  converge.  The  rate  of  convergence  increases  with  in¬ 
creasing  i.  This  process  can  be  analyzed  and  the  final 
(converged)  values  of  the  elements  can  be  computed  directly  from  the 
matrix  Si .  Figure  4.1  illustrates  the  savings  obtained  when  the  factori- 

1C 

zation  is  computed  with  an  accuracy  of  10  .  (The  gain  is  larger  if  a 

less  accurate  factorization  is  acceptable.)  The  current  computer  implemen¬ 
tation  therefore  recomputes  this  factorization  every  time  it  is  needed. 
Alternatively  the  necessary  information  could  be  stored,  requiring  much 
less  storage  than  what  is  often  used  in  similar  codes  today.  This  techni¬ 
que  can  also  be  used  advantageously  in  fast  Poisson  solvers. 


N 

50 

100 

200 

400 

800 

1600 

(#op)/1000  using  full 
factorization.  (=  3N2/1000) 

H 

30 

120 

480 

1920 

7680 

(#op)/1000  using 
convergence.  (=  p/1000) 

3.6 

8.6 

20.0 

45.2 

100.6 

220.7 

p/(NlnN) 

18.5 

18.7 

18.9 

18.9 

18.8 

18.7 

Figure  4.1.  The  total  factorization  cost  of  all  the  matrices 
S.(i=l,2,...N)  for  the  case  a  =  6  =  0. 
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If  is  indefinite  and  a  or  6  is  zero,  then  S.  can  be  written 

si  ■  * 2  T»  <tn  =  vI  * 

where  R^  is  tridiagonal.  R*  is  positive  definite  while  R^  can  be 
viewed  as  representing  a  two  term  recursion  with  characteristic  roots  in¬ 
side  the  unit  disk.  Linear  systems  involving  RT  can  therefore  be  solved 

in  a  stable  way  by  using  a  marching  procedure.  Combining  this  with  the 

? 

Sherman-Morrison  formula,  results  in  a  stable  algorithm  requiring  9N 
operations  and  2N  storage  locations  in  order  to  solve  N  indefinite  sys¬ 
tems  each  of  size  N.  Frequently,  only  a  few  systems  are  indefinite,  re- 

2  2 

suiting  in  an  operation  count  between  4N  and  9N  . 

If  is  indefinite  with  both  a  and  6  nonzero  then  band  Gauss 

elimination  can  be  used,  but  it  is  likely  that  algorithm  4.2  or  4.3  would 
be  a  better  choice  in  this  case. 

The  sine-transform  (represented  by  the  matrix  QN)  can  be  computed 

using  a  complex  fast  Fourier  transform  of  length  N/2.  The  operation  count 

depends  on  the  prime  factors  of  N.  A  complex  fast  Fourier  transform  of 

length  N  can  be  computed  using  *sNlog9N  complex  multiplications  if 
k  k 

N  =  2  and  with  no  more  than  N  Z  (n - -1 )  complex  multiplications  if 
k  1 

N  =  n  n. 
i=l  1 

Assume  for  simplicity  that  the  number  of  real  operations  required  to 
form  y  =  Q^x  is  Nlog^N.  This  corresponds  to  the  multiplications  required 
when  N  -  2k-l,  see  Temperton  jl979] , ^198oj  for  more  detailed  operation 
counts.  With  these  assumptions,  the  total  operation  count  for  algorithm 
4.1  (ignoring  lower  order  terms)  is 


(Henrici  ^1979 J ) . 


NM(21og2M  +  5k  +  12) 


(4.2) 


asmii 
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where  k  is  the  average  number  of  conjugate  gradient  iterations  for  the 
two  linear  systems  in  step  4. 

Figure  4.2  shows  the  execution  time  of  this  algorithm  on  an  IBM 
370/168  using  the  FORTRAN  H  (Opt  =  3)  compiler.  What  is  important  is  of 
course  the  general  behavior  of  the  algorithm  rather  than  the  specific  times. 
Average  running  times  based  on  problems  2,3  and  7  on  five  different  grids, 
are  given.  The  conjugate  gradient  iteration  in  step  4  was  stopped  when  the 
2-norm  of  the  residual  fell  below  the  specified  tolerance  TOL.  This  re¬ 
sults  roughly  in  a  comparable  accuracy  of  the  final  solution  to  the  dis¬ 
crete  problem.  The  numbers  include  all  preprocessing  and  do  not  represent 
a  fully  optimized  code.  The  execution  times  have  been  split  into  the  time 
required  by  the  Fourier  transform  in  step  1  and  7  (FFT)  and  the  remainder 
(SOLV).  The  gridsizes  N  =  121  and  N  =  243  result  in  an  unfavorable 
prime  factor  of  61  when  doing  the  Fourier  transform.  The  execution  time 
increases  somewhat  slower  with  N  than  indicated  by  (4.2),  reflecting 
lower  order  contributions  omitted  in  (4.2). 


N 

63 

121 

127 

243 

255 

FFT 

136 

2386 

472 

7518 

1862 

SOLV  (TOL  =  10'5) 

408 

1434 

1355 

4774 

4953 

SOLV  (TOL  =  10'10) 

557 

1905 

1784 

6773 

6931 

TOTAL  (TOL  =  10~5) 

544 

3820 

1827 

12292 

6815 

Figure  4.2  Execution  time  in  milliseconds  for 
algorithm  4.1. 


The  average  number  of  conjugate  gradient  iterations  is  given  in  Figure  4.3. 
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N 

63 

121 

127  | 

243 

255 

TOt  -  10’5 

5 

5 

5 

5 

5 

TOL  =  10"10 

7 

8 

8 

i 

9 

9 

Figure  4.3  Average  number  of  conjugate  gradient 

iterations  required.  {Problems  2,3  &  ?) 

4.2  An  algorithm  based  on  Fourier  transformations. 

The  complete  decomposition  of  the  discrete  problem  into  four  subpro¬ 
blems  as  described  in  Chapter  III,  becomes  clearer  if  sine-transforms  are 
applied  in  both  coordinate  directions.  In  this  case  it  is  necessary  to 
solve  linear  systems  of  the  form: 

QN  Si  qN  x  =  y  .  (4.3) 

It  follows  from  Theorem  3.1  that  this  system  decouples  (odd-even)  into  two 
systems  of  half  the  size.  The  subalgorithm  PENTF  (i,r,y,x)  solves  the 
odd-numbered  equations  if  r  =  1  and  the  even-numbered  if  r  =  2.  Here 
y  and  x  are  unsealed  vectors  of  length  N/2  containing  the  appropriate 
components  from  (4.3).  (For  simplicity  both  N  and  M  are  assumed  odd, 
(N+l)/2  and  (N-l)/2  are  both  written  N/2,  the  actual  value  being  clear 
from  the  context.) 

This  algorithm  can  be  derived  from  the  decomposition  given  in  Chap¬ 


ter  III.  In  the  following  description  define  k  =  2j  -  1  if  r  =  1  and 
k  =  2j  if  r  =  2. 


Subalgorithm  PENTF  (i,r,y,x). 

1.  Define  a  (work)  vector  w  of  length  N 12  by: 

cir,  k7T 

w  .  . _ 5 _  N+T _ 

J  (4sin2  h  +  u.-2)(4sin^  H  pj-  +  U^-2-a)  +  6 


Remark. 

The  quantity  can  be  precomputed  at  the  expense  of 
of  2 M  storage  locations.  Also  the  divide  in  step  3  can  be 
changed  into  a  multiply  by  storing  1/sin  jpy  . 

The  full  algorithm  can  now  be  st  '.  Let  x,y,z,s  and  p  be  five 
(work)  vectors  of  length  N/2  dr-.  ,  k  as  above. 
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For  i=l,2,...M  : 

fi.  :  =  %  fi. 

Let  x  :  =  0  ,  r  :  = 
For  i=l,3,5,...M  : 

zj  ;  * 

PENTF  (i,r,z,z) 


x  :  =  x  +  ( 


(j«l,2,...N/2) 


sin  j*-)z 


8(N+1)  (M+l y  M+l 

fik  :  =  64 ( N+l ) ( M+l j  Zj  (j=l»2,...N/2) 

Solve  a  linear  system  of  equations  Cy  =  x  using  conjugate 
gradients  preconditioned  by  the  matrix  D 

A.  A  matrix-vector  multiply  p  :  =  Cs  is  defined  by 

p  :  =  s 

For  i=l,3,...M  : 

PENTF(i,r,s,z) 

P  :  =  P  +  (j^r  sin2  ^-)z  . 

B.  A  preconditioning  step  p  :  =  D  ^s  is  defined  by: 


PJ  :  =  dk  sj 


(J-1.2....N/2) 


where  the  diagonal  matrix  D  is  defined  in  algorithm 
4.1  step  4.  (Note  that  the  summation  in  that  definition 
extends  over  odd  or  even  i.) 

For  i=l,3,5,...M  : 

PENTF(i,r,y,x) 


fik  :  '  fik  -  s1"  Sr  xj  — n/2) 


r 
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6.  Repeat  steps  2,3,4  and  5  with  i  running  over  even  integers 
instead  of  odd.  (i  =  2,4,6, ...M  everywhere). 

7.  Let  x  :  =  0  ,  r  :  =  2  ,  repeat  steps  3,4,5  and  6. 

8.  For  i=l,2,...M  : 


f . 

l  • 


Qn  fi< 


For  j=l,2,...N  : 
f  .  :  =  QM  f  .  . 

•J  M  *J 

Stop.  The  discrete  solution  of  equation  (4.1)  is  now  stored 
in  the  vector  f... 

*  J 


Remark: 


This  algorithm  has  no  restrictions  on  the  parameters  a  and 
8,  but  rapid  convergence  of  the  conjugate  gradient  algorithm 
is  only  guaranteed  when  the  corresponding  linear  systems  are 
definite.  (See  the  discussion  in  section  4.1). 

Algorithm  PENTF ( i , j,x,y)  requires  5N  operations  if  x  and  y  are 
N-vectors.  Using  the  same  assumptions  as  in  section  4.1,  the  (asymptotic) 
operation  count  for  algorithm  4.2  is: 

NM(21og2NM  +  6k  +  14)  (4.4) 


where  k  is  the  average  number  of  conjugate  gradient  iterations  for  the 
four  systems.  Results  for  this  algorithm  corresponding  to  figure  4.2  are 
given  in  figure  4.4,  and  the  average  number  of  conjugate  gradient  itera¬ 
tions  is  given  in  figure  4.5. 


. ..  -  -  - 
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N 

63 

121 

127 

243 

255 

FFT 

292 

4792 

1034 

15442 

4316 

SOLV  (T0L=10~5) 

367 

1280 

1394 

4890 

5392 

SOLV  (T0L=10'id) 

501 

1688 

1869 

6601 

7272 

Total  (T0L=lCi  ) 

659 

6072 

2428 

20332 

9708 

Figure  4.4  Execution  time  in  milliseconds  for 
algorithm  4.2. 


N 

63 

121 

127 

243 

255 

T0L  =  10"5 

3 

3 

3 

3 

3 

o 

rH 

1 

o 

l“t 

If 

p 
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Figure  4.5  Average  number  of  conjugate  gradient 

iterations  required.  (Problems  2,3  &  7) 


Algorithm  4.2  is  a  very  good  alternative  to  algorithm  4.1,  in  parti¬ 
cular  if  a  sequence  of  problems  are  being  solved  and  it  is  possible  to 
work  with  Fourier  transformed  variables.  If  this  is  the  case  the  cost  of 
the  Fourier  transforms  may  be  ignored.  This  is  certainly  the  case  when 
computing  discrete  approximations  to  the  eigenvalues  and  eigenfunctions 
of  the  continuous  problem.  It  should  also  be  pointed  out  that  symmetries 
in  a  given  problem  may  greatly  reduce  the  computational  work,  since  the 
vector  x  in  step  4  then  often  will  be  zero.  (Resulting  in  a  trivial 
problem.)  In  this  respect  the  numerical  algorithm  can  be  viewed  as  an 
efficient  numerical  implementation  of  the  decomposition  of  the  space  of 
solutions  to  the  first  biharmonic  problem  into  four  orthogonal  subspaces. 

(See  Aronszajn,  Brown  and  Butcher  ^1973 J ,  Vaughan  ^  1974 J  and  Fichera  ^1966*] ) . 
This  property  is  further  discussed  in  the  beginning  of  Chapter  V. 
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4.3  An  efficient  direct  method. 

Algorithm  4.2  can  also  serve  as  a  basis  for  a  direct  method.  In¬ 
stead  of  using  conjugate  gradients  to  solve  the  four  linear  systems  of 
order  N/2,  in  step  4,  the  systems  can  be  solved  by  using  symmetric  Gauss 
elimination.  If  an  indefinite  symmetric  solver  is  used,  (Aasen  [l97lj , 
Bunch  and  Parlett  ^197lj)  then  all  nonsingular  discrete  analogs  of  (4.1) 
can  be  solved. 

Algorithm  4.3 

This  algorithm  is  identical  to  algorithm  4.2  except  for  step  4. 

4.  A.  Generate  the  elements  of  the  matrix  C. 

B.  Factor  the  matrix  C  using  a  symmetric  factorization. 

C.  Solve  the  linear  system  Cy  =  x  using  the  computed 
factorization  of  C. 

If  a  sequence  of  problems  with  the  same  parameters  a  and  3,  are 
solved  on  the  same  grid  then  steps  4A  and  4B  need  not  be  repeated.  The 
computer  implementation  of  this  algorithm  uses  the  routines  DPPFA  and 
DPPSL  or  DSPFA  and  DSPSL  from  UNPACK  (Dongarra,  Bunch,  Moler  and  Stewart 
[1979]). 

There  remains  to  show  how  to  generate  the  matrix  elements.  The 
columns  of  C  are  determined  by  repeated  use  of  step  4A  in  algorithm  4.2, 
choosing  the  vectors  s  as  the  columns  of  the  identity  matrix.  Exploit¬ 
ing  symmetry  and  the  fact  that  s  is  sparse,  all  four  matrices  can  be 
generated  using  MN  /4  +  0(MN)  arithmetic  operations.  The  same  opera- 
tioncount  results  if  the  matrices  Crs  given  in  theorem  3.1,  are  gener¬ 


ated  and  then  multiplied  together.  In  fact,  the  two  processes  are  equiva¬ 
lent.  The  factorization  cost  in  step  4B,  for  all  four  matrices  is 
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3  2 

N  / 12  +  0 ( N  ).  The  extra  storage  needed  in  order  to  save  all  four  fac- 

2 

torizations,  in  steps  4A  and  4B,  is  only  N  /2.  The  leading  term  in  the 
operationcount  for  algorithm  4.3  is 

is  NM(N  +  ]•  N2/M  +  81og2NM)  +  0(NM)  (4.5) 

for  the  first  right  hand  side,  and 


NM(21og2NM  +  N/M  +  14)  +  0(N)  +  0(M)  (4.6) 

for  additional  right  hand  sides.  Figure  4.6  gives  the  execution  time  for 
this  algorithm  on  a  VAX-11/780  computer.  In  order  to  make  comparisons 
with  the  previous  algorithms  easier,  the  first  row  in  the  table  has  been 
compared  with  the  corresponding  row  in  figure  4.4  and  all  other  entries 
are  given  in  approximate  IBM  370/168  time  using  this  normalization. 


N 

63 

127 

255 

VAX  FFT 

2672 

11426 

49840 

FFT 

292 

1034 

4316 

SOLV 

(4.5) 

353' 

1592 

9707 

SOLV 

(4.6) 

120 

398 

1616 

Total 

(4.6) 

412 

1432 

5932 

Figure  4.6  Normalized  execution  time  in  milliseconds 
for  algorithm  4.3. 


Remarks. 

i)  The  algorithms  presented  in  sections  4.1,  4.2  and  4.3 
have  been  stated  in  order  to  show  the  structure  and  sim¬ 
plicity  of  a  possible  computer  implementation  reflecting 


d 


A*  ■ 
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the  structure  and  simplicity  of  the  underlying  theory 
developed  in  Chapter  III. 

ii)  The  algorithms  can  be  improved  upon  when  solving  spe¬ 
cial  cases  of  (4.1).  For  example,  an  even  better  pre¬ 
conditioning  matrix  can  be  constructed  when  a  =  6  =  0, 
by  incorporating  the  knowledge  from  Lemma  A2.2.  Recall 
however,  that  such  improvements  although  important  in  some 
applications,  only  reduces  the  constant  in  the  operation- 

count,  The  complexity  of  algorithms  4.1  and  4.2  using  an 
2  2 

0(N  )  Poisson  solver,  is  0(N  )  and  this  result  is  op¬ 
timal  . 

iii)  Algorithms  4.1  and  4.2  will  execute  faster  if  it  is  ac¬ 
ceptable  to  use  more  storage  for  intermediate  results. 

If  the  vector  w  in  subalgorithm  PENTF  is  precomputed  and 
stored  (requiring  0(NM)  storage),  then  the  operation- 
count  of  this  important  subroutine  is  reduced  from  5N  to 
3N. 

iv)  A  direct  method  based  on  algorithm  4.1  has  an  operation- 
count  of 


NM(21og2M  +  N/M  +  12)  .  (4.7) 

This  is  somewhat  faster  than  (4.6),  but  the  algorithm  is  not  as  general. 

4.4  The  solution  of  several  problems  on  the  same  grid  using  conjugate 
gradients. 

In  this  section  it  is  shown  that  a  small  modification  of  algorithms 
4.1  and  4.2  can  reduce  the  computational  cost  when  solving  a  sequence  of 
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problems  on  the  same  grid.  There  are  several  situations  where  this  can 

2 

be  of  interest.  When  solving  very  large  systems  the  0(N  M)  preprocess¬ 
ing  cost  of  algorithm  4.3  may  be  unacceptable.  Perhaps  more  important, 
the  technique  described  in  this  section  can  be  used  when  solving  a  se¬ 
quence  of  problems  of  the  form  (4.1),  allowing  not  only  f,g,  and  h,  but 
also  the  parameters  a  and  6  to  change. 

Consider  the  following  algorithm  for  solving  a  symmetric,  positive 
definite  linear  system  Ax  =  b.  Given  Xq  and  Hq  let: 


x_ .  i  =  x  +  a  D 
n+1  n  nrn 


»  = 


_  (Hn-lVrn) 


n  '  <Ap„.Pn) 

-Hr  +BD  ft  - 

n+1  Hrt  n+1  6npn  •  6n  (VlVrn) 


(4.8) 


Vl  *  Hn  +  8(pn,Ap„,Hn) 

where  rn  =  b  '  AV  p0  =  H0r0  and  H-1  =  H0  * 

If  Hq  =  I  and  U  =  0  this  is  the  conjugate  gradient  method  (3.18). 

If  Hq  i  I  and  U  =  0  it  is  preconditioned  conjugate  gradients  with  a 

preconditioning  matrix  Hn.  If  U  =  U  (s,y,H)  is  any  member  of  the 

Broyden  [l97o]  B-class  of  quasi-Newton  updates,  then  in  exact  arithmetic, 

this  algorithm  generates  the  same  sequence  {xn>  as  when  U  =  0. 

(Nazareth  ^1979]).  In  finite  precision  calculations,  the  choice 

U  =  U  (s,y,H)  results  in  a  more  stable  algorithm  when  solving  problems 
6 

similar  to  (3.16)  or  (3.17)  avoiding  the  characteristic  loss  of  ortho¬ 
gonality  (Parlett  ^198oj),  that  can  affect  the  rate  of  convergence  of  the 
conjugate  gradient  method.  The  process  (4.8)  can  be  viewed  as  a  conjugate 
gradient  method  with  a  variable  preconditioning  matrix  (a  variable  metric) 
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u  ^ 

making  the  matrix  H*  A  increasingly  more  well  conditioned.  This 
observation  shows  that  a  sequence  of  problems  can  be  solved  more  efficient¬ 
ly  provided  that  the  information  stored  in  Hn  from  a  previous  iteration, 
is  saved. 

Since  all  the  updates  U^(s,y,H)  are  equivalent  in  exact  arithme¬ 
tic,  the  symmetric  rank  one  update  given  by 

(s-Hy)(s-Hy)T 

U<;oi(s,y,H)  =  - ¥ -  .  (4.9) 

SR1  (s-Hy)  y 

seems  to  be  best  suited  in  this  particular  situation.  At  every  iteration 
only  one  new  vector  v„  =  (s  -H  v  )  must  be  stored.  This  is  an  N-vector 
in  algorithm  4.1  and  a  vector  of  length  N/2  in  algorithm  4.2.  Algorithm 
4.4  outlines  how  this  technique  can  be  used  when  2K  vectors  of  length 
N  are  available. 

Algorithm  4.4. 

A.  For  each  new  problem  apply  algorithm  4.1  or  4.2,  but  use  version 
(4.8)  of  the  conjugate  gradient  method  in  step  4.  Initially  the 
matrix-vector  product  p  :  =  HgS  is  defined  in  step  4B,  but  at 
step  n,  n  <_  K  it  will  be  given  by 

p  :  =  Hqs  +  L  (Y-vls)v,  (4.10) 

i=l  1 

where  T i  is  the  scaling  factor  from  (4.9)  and  the  vector  vi  is 
a  stored  update. 

B.  When  a  total  of  K  conjugate  gradient  iterations  have  been  performed, 
(possibly  after  solving  more  than  one  of  the  problems  in  the  sequence) 
continue  with  U  ;  0  and  use 


- ...  ...  ■  >  Wf 
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P  : 


(4.11) 


in  step  4B. 

Only  a  few  updates  are  needed  in  order  to  achieve  convergence  in 
one  or  two  iterations  and  the  cost  of  this  procedure  does  not  add  to  the 
leading  terms  in  the  operationcounts  of  algorithms  4.1  and  4.2.  Algorithm 
4.4  can  be  used  advantageously  even  when  the  parameters  a  and  6  in 
(4.1)  are  changing.  However,  in  this  case  it  may  sometimes  be  necessary 
to  restart  with  Hq  as  defined  in  step  4B  of  algorithms  4.1  and  4.2. 

1979 J  showed  that  an 
interesting,  alternative  updating  strategy  is  possible. 

Algorithm  4.4  in  combination  with  both  4.1  and  4.2,  was  tried  taking 
K  =  5.  All  10  test  problems  were  solved  on  an  IBM  370/168  with  a  stop- 
ping  tolerance  TOL  =  10  .  The  average  time  per  problem  and  the  average 
number  of  conjugate  gradient  steps  needed  are  given  in  figures  4.7  and 
4.8.  For  comparison,  the  same  sequence  of  problems  were  solved  (on  a 
VAX-11/780)  using  algorithm  4.3.  The  normalized  IBM  times  are  given  in 


Figure  4.7  Average  execution  time  (ms)  per  problem 
when  solving  10  problems  with  algorithm 
4. 4/4.1. 


If  the  BFGS  update  Ugpgg  is  used,  then  Nocedal  £ 
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N 

63 

127 

255 

FFT 

290 

1034 

4308 

SOLV 

291 

1055 

4037 

TOTAL 

581 

2089 

8345 

#Cg-iterations 

1.4 

_ 

1.6 

1.7 

Figure  4.8  Average  execution  time  (ms)  per  problem 
when  solving  10  problems  with  algorithm 
4. 4/4. 2. 


N 

65 

127 

255 

VAX  FFT 

2672 

11426 

49840 

FFT 

290 

1034 

4308 

SOLV 

142 

521 

2421 

TOTAL 

432 

1555 

6729 

Figure  4.9  Normalized  execution  time  (ms)  per  problem 
when  solving  10  problems  with  algorithm  4.3. 

Figure  4.7  and  4.8  show  that  the  number  of  conjugate  gradient  itera¬ 
tions  decreases  significantly.  Figure  4.9  and  4.6  show  that  the  prepro¬ 
cessing  cost  of  algorithm  4.3  is  also  quite  acceptable  for  this  problem. 
It  should  be  noted  that  the  total  cost  when  solving  10  problems  using  the 
three  different  methods  are  almost  equal.  In  fact,  comparing  expressions 
(4.5)  and  (4.6)  with  (4.4)  indicate  that  the  two  methods  are  equally 
efficient  when 
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problems  are  being  solved.  In  (4.12)  k  is  the  average  number  of  conju¬ 
gate  gradient  iterations  required  per  problem.  Taking  N  =  255  and 
k  =  1.7,  the  two  methods  should  be  equally  efficient  when  solving  8  pro¬ 
blems  and  this  corresponds  well  with  the  computational  results. 

4.5  Algorithms  for  vector  and  parallel  computers. 

Sameh,  Chen  and  Kuck  [l976 J  considered  algorithms  for  Poisson's  equa¬ 
tion  and  the  biharmonic  equation  under  the  assumptions  of  an  "idealized 

2  3 

parallel  computer"  having  N  or  N  processors.  They  concluded  that  the 
biharmonic  equation  was  an  order  of  magnitude  more  difficult  than  Poisson's 
equation.  This  section  gives  new  and  improved  results,  as  well  as  more 
practical  results  for  the  case  when  a  fixed  number  of  processors  and/or  a 
vector  computer  is  available.  Without  loss  of  generality,  the  discussion 
is  limited  to  algorithm  4.2  with  N  =  M. 

While  truly  parallel  computers  having  p  independent  processors  with 
unrestricted  communication,  exist  mostly  as  theoretical  models,  vector  com¬ 
puters  capable  of  performing  arithmetic  operations  on  vector  registers, 
play  an  increasingly  more  important  role  in  current  large  scale  scientific 
computations.  An  algorithm  for  the  biharmonic  equation  on  such  a  computer 
will  therefore  be  considered  first.  The  following  simplifying  assumptions 
are  made: 

i)  There  are  p  processors  available. 

ii)  The  four  arithmetic  operations  +,  and  /  can  be  per¬ 
formed  by  these  processors  working  on  vector  registers, 

iii)  An  operation  (or  a  timestep)  will  consist  of  an  addition 
or  subtraction  and  a  multiplication  or  a  divide  performed 
componentwise  on  vectors  of  length  at  most  p. 
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iv)  Startup  costs  including  memory  and/or  data  alignment 
times  are  ignored. 

These  assumptions  are  naturally  not  fully  realistic,  but  different  machine 
architectures  make  it  difficult  to  use  a  more  complicated  model.  An  al¬ 
gorithm  that  performs  well  under  the  above  assumptions  is  likely  to  also 
be  very  efficient  on  pipelined  vector-computers  like  the  CRAY-1.  Despite 
having  only  one  processor,  this  computer  performs  vectoroperations  so 
efficiently  that  the  model  can  be  used  with  an  effective  p  larger  than 
one.  Alternatively,  the  cost  of  a  vector  operation  can  be  measured  as 
S  +  Rp  where  S  is  a  startup  time  and  R  is  the  vector-rate.  It  will 
be  clear  from  the  discussion  that  a  more  detailed  analysis  for  a  specific 
computer  can  be  carried  out. 

Consider  an  algorithm  where  the  vectorization  is  performed  on  the 
inner  loops  of  algorithm  4.2  whenever  possible,  resulting  in  an  algorithm 
closely  related  to  the  sequential  method.  Assume  that  p  ^  N  processors 
are  available.  The  description  references  the  steps  of  algorithm  4.2. 
Algorithm  4.5. 

Steps  1  and  8. 

a)  The  setup  time  scaling  the  right  hand  side,  takes  N  /p 
timesteps  using  p  <_  N  processors. 

b)  The  remainder  of  step  1  and  all  of  step  8  is  computed  by 

using  a  sequential  fast  Fourier  transform  algorithm  on 

p  independent  vectors  in  parallel.  The  total  time  for 
2 

this  is  4N  logN/p  using  p  £  N  processors. 

Steps  3  and  5. 


a)  The  subalgorithm  PENTF  (i,r,y,x)  as  stated  in  section 
4.2,  consists  of  two  vector  operations  forming  w,  two 
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vector  inner products  when  computing  c^  and  c,,,  and 
finally  two  vector  operations  when  calculating  x.  All 
vectors  in  this  subroutine  have  length  N/2.  Assuming 
that  an  innerproduct  between  two  vectors  of  length  N 
requires  N/p  +  logp  vector  operations,  the  cost  of 
PENTF  (i,r,y,x)  is: 

i)  3N/p  +  21ogp  if  no  preprocessing  is  done, 

ii)  5N/2p  +  logp  if  is  precomputed, 

iii)  3N/2p  +  logp  if  and  w  are  precomputed. 

Notice  that  these  results  are  valid  for  p  <_  N/2. 
b)  Therefore,  assuming  (as  in  section  4.2)  that  only  c^ 
is  precomputed,  steps  3  and  5  require  approximately 
N(13N/p  +  41ogp)  timesteps  with  p  £  N. 

Step  4. 

a)  Using  the  same  assumptions,  a  matrix-vector  product  takes 
N 

^(6N/p  +  21ogp)  timesteps  using  p  £  N/2  processors. 

b)  All  other  operations  in  the  conjugate  gradient  iteration, 
including  the  preconditioning  step,  can  be  performed  in 
0(N/p)  +  O(logp)  timesteps  per  iteration.  Since  only  k 
iterations  are  needed,  k  independent  of  N,  the  cost 
of  step  4  solving  all  four  linear  systems,  is  approxi¬ 
mately  kN(6N/p  +  21ogp)  with  p  £  N.  The  total  time 
required  for  this  algorithm  is  therefore 

2 

•jjj-(21ogN2  +  6k  +  14)  +  2N  ( k+2 )  logp  .  (4.13) 

Notice  that  the  first  term  in  this  expression  is  (4.4)  divided  by  p. 
For  large  N  and  p  <<  N  the  speedup  is  very  close  to  p.  If  p  =  N 
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the  operation  count  becomes 

N(2(k+4)1ogN  +  4k  +  10)  ,  (4.14) 

since  the  logp  term  arising  from  inner  products  of  vectors  of  length 
N/2,  never  exceeds  log(N/2).  In  this  case  the  speedup  is  proportional 
to  N/k. 

As  an  illustration,  a  computer  implementation  of  algorithm  4.2  was 
tried  on  an  IBM  370/168  and  also  on  the  CRAY-1  computer.  Figure  4.10  dis¬ 
plays  some  timing  results. 


N  = 

255 

N  = 

511 

SOLV 

FFT 

SOLV 

FFT 

IBM  370/168 

7109 

4324 

CRAY-1 

OFF  =  v 

1804 

882 

7299 

3506 

CRAY-1 

ON  »  v 

251 

548 

878 

2148 

Figure  4.10  Time  in  milliseconds  to  solve  the 

biharmonic  equation  on  an  N  by  N  grid. 


Remarks. 


i)  The  total  solution  time  is  the  sum  of  the  time  spent  in 
a  fast  Fourier  transform  routine  (FFT)  and  in  the  re¬ 
mainder  of  the  code  (SOLV). 

ii)  The  FORTRAN  H(0PT=3)  compiler  was  used  on  the  168, 
while  the  CFT  compiler  on  the  CRAY-1  was  used  with  and 
without  the  vectorization  option.  (ON  =  v  and  OFF  =  v). 

iii)  The  same  FORTRAN  source  code  was  used  in  all  three  cases 
with  the  single  exception  that  a  special  vector  inner  pro¬ 
duct  routine  written  by  Oscar  Buneman  [l98o] ,  was  used  in 
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the  vectorized  run. 

iv)  The  iterative  part  of  the  algorithm  was  terminated  when 
the  2-norm  of  the  residual  fell  below  the  tolerance 
TOL  =  10"10. 

v)  No  attempt  was  made  to  optimize  the  code  by  avoiding  nonvec- 
torizable  0(N)  contributions  to  the  execution  time.  In 
particular,  the  Fourier  transform  part  of  the  code  was  not 
implemented  as  in  algorithm  4.5,  and  therefore  executes 
slowly. 

vi)  Notice  the  substantially  improved  execution  time  for  the 
SOLV-part  when  vector ization  is  turned  on.  The  speedup 
compared  with  scalar  processing,  is  between  seven  and  eight, 
while  the  Fourier  transform  routine  only  gains  a  factor 
1.6.  The  algorithm  is  sufficiently  parallel  in  its  struc¬ 
ture  that  a  FORTRAN  program  written  for  a  sequential  com¬ 
puter,  immediately  speeds  up  when  given  to  the  CFT-compiler 
on  the  CRAY-1. 

An  alternative  to  algorithm  4.5  is  to  perform  all  the  vectorizations 
on  the  outer  loops.  The  resulting  code  will  differ  more  from  a  sequen¬ 
tial  implementation,  but  it  avoids  the  difficulty  with  the  vector  inner- 
products  in  algorithm  4.5.  The  following  is  a  brief  description  again 
refering  to  algorithm  4.2.  Steps  1  and  8  will  be  as  in  algorithm  4.5. 

In  steps  3  and  5  the  vectorization  must  be  performed  on  the  index  i, 

2 

resulting  in  a  cost  of  13N  / p  timesteps.  Similarly  in  step  4,  the  cost 

2 

of  a  matrix-vector  product  becomes  3N  / 2p  timesteps  using  p  <_  N/2.  The 

2 

cost  of  solving  all  four  systems  is  therefore  6kN  /p  resulting  in  a 
total  cost  of 
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(21ogN2  +  6k  +  14)  .  (4.15) 

Comparing  this  with  (4.4)  shows  that  an  optimal  speedup  of  p  has  been 
achieved. 

2 

Finally,  consider  the  case  where  a  parallel  computer  with  N  pro¬ 
cessors,  as  described  in  Sameh,  Chen  and  Kuck  [l976j,  is  being  used.  Us¬ 
ing  their  results  on  computing  the  fast  Fourier  transform,  and  a  combination 
of  the  two  algorithms  outlined  in  this  section,  it  can  be  shown  that  a 
method  based  on  algorithm  4.2  can  be  executed  in  O(logN)  timesteps.  This 
is  an  order  of  magnitude  faster  than  results  of  Sameh,  Chen  and  Kuck  and 
its  complexity  is  the  same  as  that  of  a  Poisson  solver  under  similar  assump¬ 
tions. 

Remark. 

The  main  purpose  of  this  sect'on  is  to  outline  results  when  using  mul¬ 
tiprocessor  computers.  These  results  can  also  be  useful  when  considering 
algorithms  for  vector  computers.  The  particular  operation  counts  are  of 
the  right  order  of  magnitude,  but  the  constants  can  certainly  be  improved 
by  departing  in  certain  respects  from  the  particular  underlying  sequential 
algorithm  4.2. 

4.6  Roundoff  errors. 

The  numerical  algorithms  proposed  in  this  Chapter,  are  all  solving  a 
linear  system  of  equations  with  coefficient  matrix  A  given  by  (3.2). 

4 

The  condition  number  of  this  matrix  is  proportional  to  N  .  Classical 
theory  for  linear  equations  (Wilkinson  ^1965]),  shows  that  there  exist  a 
right  hand  side  and  a  perturbation  of  this  vector,  that  result  in  errors 
in  the  solution  proportional  to  the  condition  number  times  the  original 


■  :TSw  j. 


-87- 

These  questions  have  not  received  much  attention,  mainly  because  the 
truncation  error  usually  is  more  important  as  long  as  the  numerical  method 
is  stable.  With  the  development  of  fast  methods,  in  particular  for  fourth 
order  problems,  this  is  no  longer  necessarily  true. 

in  order  to  study  the  roundoff  error,  algorithm  4.3  was  tried  in  both 
single  and  double  precision.  The  difference  between  the  two  results  were 
computed  for  the  five  first  test  problems.  The  maximum  roundoff  error  on 
the  grid,  propagating  from  the  inexact  representation  of  the  right  hand 
side  in  single  precision  is  shown  in  figure  4.11.  The  computer  used  was 

a  VAX-11/780  with  single  precision  machine  epsilon  of  3-10  . 

4  2 

Lines  indicating  growth  in  errors  proportional  to  N  and  N  are 
indicated  as  references.  The  figure  shows  a  difference  between  highly 
symmetric  problems  (1,  4  and  5)  and  more  general  problems  like  2  and  3. 

The  analysis  of  this  difference  will  not  be  pursued  here,  but  it  seems 
clear  that  it  is  related  to  the  fact  that  most  of  the  linear  systems  in 
step  4  of  algorithm  4.2  are  essentially  trivial  in  the  symmetric  cases. 

4 

The  figure  indicates  that  the  roundoff  error  stays  well  below  the  N 
reference  lines  in  all  cases. 
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4.7  Efficient  solution  of  the  discrete  system  when  the  cubic  extrapolation 

scheme  is  used  near  the  boundary. 

The  stencil  (2.4)  employing  cubic  extrapolation  near  the  boundary, 
leads  to  a  nonsymmetric  linear  system  of  equations  with  a  slightly  more  com 
plicated  structure  than  (3.2).  The  two  finite  difference  schemes  are  of 
the  same  order  of  accuracy  when  computing  the  discrete  solution  u,  but 
as  shown  in  Chapter  II,  there  are  cases  when  the  cubic  extrapolation  pro¬ 
cedure  is  preferable. 

It  has  been  claimed  (Gupta  ^1979]),  that  fast  methods  for  the  classi¬ 
cal  approximation  using  quadratic  extrapolation,  cannot  be  used  and  that 
more  general,  but  expensive  methods  are  necessary  when  the  cubic  extrapola¬ 
tion  is  used.  The  present  section  outlines  two  new  fast  methods  based  on 
the  algorithm  developed  in  Chapter  III. 

First,  consider  the  possibility  of  deriving  a  numerical  method  using 
the  same  ideas  as  in  Chapter  III.  Without  loss  of  generality  it  is  assumed 
that  N  =  M.  The  coefficient  matrix  A  can  be  written  as 

A  =  [(I<g)R)  +  (R®I)]2  +  (Tc(g)  I)  +  (I®T.)  (4.16) 

where 

Tc  =  4(elel  +  eNeN^  "  ^ele2  +  eNeN-l)  •  (4-l?) 

Tc  =  UVT 


Let 
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where 


/8  -1  0  0\ 


U  =  % 


\0  0  -1  8/Nx4 


.  V  = 


N*4 


Proceeding  as  in  Chapter  III, 

P(I®Q)A(I®  Q)PT  =  S  +  (X(g)I)(Y0I)T  (4.18) 


where  X  =  QU  and  Y  =  Q V  . 

A 

S  is  blockdiagonal  and  each  block  has  a  pentadiagonal  slightly  nonsymme- 
tric  structure.  This  poses  no  real  difficulty  and  fast  methods  for  solv¬ 
ing  linear  systems  involving  S  can  be  devised.  The  matrix  B  (3.11), 
now  takes  the  form 

B  =  I  +  (Y0I)T  S-1(X(2) I )  •  (4.19) 

This  matrix  has  a  4*4  blockstructure  with  blocksize  N.  By  going 
through  the  same  calculations  that  led  to  (3.16)  and  (3.17),  this  matrix 
decouples  into  two  2*2  block  matrices.  The  systems  can  be  further  re¬ 
duced  by  one  step  of  block  Gaussian  elimination.  The  resulting  N  x  N 
matrices  can  be  generated  and  then  factored  using  the  LU  -  decomposition. 
Another,  perhaps  more  interesting  idea  is  to  solve  these  systems  using  an 
iterative  method.  As  an  example,  one  of  the  linear  systems  that  must  be 
solved  has  the  following  structure 

(I  +  N?T  k=1E3>5  sin2  Sk1)x  =  b 


(4.20) 
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where  $k  is  the  k-th  block  of  S. 

Proceeding  as  in  Chapter  III,  a  good  preconditioning  matrix  will  be 
obtained  by  replacing  by  S^.  Since  the  problem  is  nonsymmetric,  the 
conjugate  gradient  method  cannot  be  used  in  the  same  way  as  before.  The 
normal  equations  can  always  be  used  in  combination  with  the  conjugate  gra¬ 
dient  method,  but  such  an  approach  is  unnecessarily  expensive  when  dealing 
with  this  problem.  Various  extensions  of  the  conjugate  gradient  method 
have  been  considered,  see  Kincaid  and  Young  |l98oj  and  others.  Theoreti¬ 
cal  understanding  of  these  methods  is  not  complete. 

Consider  the  class  of  quasi-Newton  updates  proposed  by  Broyden  ^1965j , 
for  solving  systems  of  nonlinear  equations.  Given  a  nonsingular  nxn  matrix 
Hg,  an  initial  guess  xQ  and  rg  =  Axg  -  b,  for  k  =  0,1,2...  let 

sk  =  •  Hkrk  ’ 

Vl  =  xk  +  3k  • 

yk  =  Ask  ,  (4.21) 

Vi  =  rk  +  yk  ■ 

Hk+1  =  Hk  +  (sk'Hkyk)vk  ’ 
where  v k  yk  =  1  and  v  k  H'1  sk  ?  0  . 

Gay  ^1979j  has  shown  that  this  process  converges  in  at  most  2n  steps  to 
the  solution  of  the  linear  system.  Moreover,  computational  experience 
indicates  that  convergence  often  is  very  rapid  when  the  spectrum  of  AHQ  is 
clustered.  The  symmetric  rank  one  update  discussed  in  section  4.4,  is  ob¬ 
tained  by  setting  v  =  (s-H^)/yT(s-H^)  and  this  method  was  tried  when 
solving  nonsymmetric  systems  like  (4.20).  When  using  the  proper  precondi¬ 
tioning  matrix  Hg,  convergence  is  very  rapid.  This  is  plausible  since  the 
algorithm  converges  in  at  most  2p  steps  if  AHQ  has  only  p  distinct  eigenvalues. 
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A  limited  storage  version  of  (4.21)  can  be  implemented  requiring  only 
the  two  matrix-vector  products  y  =  As  and  v  =  Hy  in  addition  to  a  few 
vector  operations  per  iteration.  The  required  storage  is  only  three  vec¬ 
tors  in  addition  to  K  (K  ^  1)  vectors  for  the  updates.  This  also  suggests 
an  alternative  way  of  solving  the  original  discrete  problem  by  applying  al¬ 
gorithm  (4.21)  directly,  defining  HQ  by  one  of  the  fast  algorithms  dis¬ 
cussed  in  the  beginning  of  this  Chapter.  This  technique  was  used  when  pro¬ 
ducing  the  results  in  Chapter  II.  Figure  4.12  shows  the  number  of  iterations 

required  to  solve  problems  1  and  2  on  four  different  grids. 

2 

Only  5  updates  and  a  total  storage  requirement  of  8N  was  used.  The 
iteration  was  stopped  when  |j Ax-b  || 2  <  (N+1)T0L. 


Figure  4.12  Number  of  iterations  required 
when  using  Broyden's  method. 

After  5  iterations  the  method  was  restarted  with  Hg  =  Hq.  Experiments 
show  that  this  is  more  efficient  than  using  Hk+1  s  Hk  for  k  >_  5.  No¬ 
tice  that  the  number  of  iterations  tend  to  decrease  with  increasing  N. 
The  required  work  to  solve  this  problem  is  therefore  of  the  same  order  as 
in  the  case  when  a  quadratic  boundary  extrapolation  is  used. 

Remarks. 

i)  Both  Hq1  and  A  are  discrete  approximations  to  the  bihar¬ 
monic  operator,  but  the  approximations  are  different  and  not 
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necessarily  very  accurate  ((2.3)  and  (2.4))  near  the  boundary, 
ii)  Broyden's  method  used  in  this  manner  promises  to  be  ad¬ 
vantageous  in  other  similar  situations  as  well, 
iii)  The  use  of  Broyden's  method  to  solve  large  nonsymmetric 
systems  of  linear  equations  appears  to  be  new.  The  symme¬ 
tric  rank  one  update  is  of  particular  interest  in  this  con¬ 
text,  since  it  also  belongs  to  the  Broyden  B-class  and 
therefore  is  related  to  the  conjugate  gradient  method  in  the 
symmetric  case.  The  method  behaves  very  similar  to  the  con¬ 
jugate  gradient  method,  but  can  be  used  to  handle  a  larger 
class  of  problems. 

The  symmetric  rank  one  update  has  some  theoretical  difficulties  since 
s-Hy  and  y  can  become  orthogonal.  This  never  caused  any  problems  in  the 
applications  tried  here,  but  a  study  of  theoretical  aspects  using  one  of 
Broydens  single  rank  updates,  when  solving  large  linear  systems  is  plan¬ 
ned  for  the  near  future. 

4.8  Efficient  solution  of  the  biharmonic  equation  in  a  disk. 

Consider  the  biharmonic  Oirichlet  problem  on  a  disk  of  radius  R, 

A^u  =  f  r  <  R 

u  =  g  r  =  R  (4.22) 

u  =  h  r  =  R  . 

r 

In  polar  coordinates  the  biharmonic  operator  takes  the  form 

4,e  =  F(sr<r3r>  *F9!>  F(3r<r9r>  *  F  S9>  •  (4'23> 

Glowinski  and  Pironneau  [l979]  remarked  that  a  discrete  form  of  this 
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problem  derived  from  a  finite  difference  grid  based  on  polar  coordinates, 
can  be  solved  by  using  the  "coupled  equation  approach".  (See  Chapter  I, 
section  viii}.)  The  present  section  describes  an  algorithm  which  is  an 
order  of  magnitude  faster.  Taking  advantage  of  the  explicit  formula  (1.8) 
valid  when  f  =  0,  makes  it  possible  to  design  a  direct  method  for  (4.22). 
Let 


First  solve 


u  =  +  u2 


AWj  =  f 


and  then 


AU1  »  Wj 

Uj  =  g 

The  problem  for  u2  becomes 

A2 u2  *  0 
u2  =  0 


(L'2)r  *  h  -  (u1 ) 


l'r 


r  <  R 
r  =  R 

r  <  R 
r  =  R 

r  <  R 
r  =  R 
r  =  R 


Now  write 


u2  =  -  (R2-r2)Vj  +  v2 


(4.24) 


(4.25) 


(4.26) 


and  require  that  Vj  and  v?  be  harmonic  (see  1.7).  Since  v2  vanishes 
at  the  boundary,  it  follows  that  it  is  identically  zero.  Now 

3u, 


and  therefore 


<5T>  .*  2Rvl 


r=R 


Av^  =  0 

V1  =  Ir  (u2}. 


r  <  R 
r  -  R 


(4.27) 


In  this  way  the  numerical  solution  of  (4.22)  has  been  reduced  to  the  solu¬ 
tion  of  three  Poisson  equations  on  the  same  grid.  The  derivative  (u^)r 
which  is  needed  in  (4.26)  must  be  computed  with  sufficient  accuracy  from 
the  solution  of  (4.25).  If  a  second  order  method  is  used  for  solving  Pois¬ 
son's  equation  then  the  discrete  value  of  (u^)r  should  also  be  second 
order  accurate.  A  second  order  accurate  numerical  solution  to  the  original 
(smooth)  problem  (4.22)  can  then  be  obtained. 

A  computer  implementation  using  the  subroutine  PWSPLR  (Swarztrauber 
and  Sweet  ^1975 J )  for  Poisson's  equation,  has  been  written.  The  algorithm 
has  an  operation  count  of  O(NMlogN)  when  a  discretization  with  N  points 
in  the  e-direction  and  M  points  in  the  r  -  direction  is  used.  A  some¬ 
what  faster  code  requiring  less  storage,  could  be  implemented  by  taking 
advantage  of  the  zero  right  hand  side  in  (4.27).  Figure  4.13  displays  the 
results  from  a  test  using  an  IBM  370/168  computer.  The  problem  was  solved 
in  the  unit  disk  and  the  exact  solution  is  given  by  u  =  ersin0. 


M 

N 

TIME  (MS) 

MAX  ERROR 

32 

32 

483 

4.89  10'4 

64 

64 

2158 

1.22  10-4 

128 

128 

9670 

3.05  10'5 

Figure  4.13  Execution  time  and  discretization 
error  when  solving  the  bi harmonic 
equation  in  a  disk. 

It  is  easily  seen  that  the  discrete  solution  is  second  order  accurate. 
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4.9  Conformal  mapping  and  the  solution  of  the  biharmonic  equation  on 


more  general  domains. 

2 

Given  a  domain  fi  c  R  ,  with  boundary  3fi  ,  consider  the  bi harmonic 
w  w 

Dirichlet  problem, 

Awu  =  f  in  nw 

u  =  g  on  3nw  (4.28) 

u  =  h  on  3ft 

n  w 

Assume  that  it  is  possible  to  map  the  unit  disk  conformally  to  ftw,  i.e. 
the  mapping  function  or  a  sufficiently  accurate  approximation  is  known  or 
efficiently  computable.  This  is  indeed  the  case  for  all  polygons,  since 
the  map  in  this  case  called  the  Schwarz-Christoffel  transformation,  has  a 
simple  form  and  can  be  accurately  computed.  (Trefethen  ^198ol).  There  are 
also  methods  for  computing  the  map  to  more  general  domains,  see  for  ex¬ 
ample  Gaier  [l964j ,  Symm  ^1966j,  Chakravarthy  and  Anderson  [l979],  Gutknecht 
[l98o],  Fornberg  ^1980 j.  This  section  outlines  how  it  is  possible  to 
solve  (4.28)  taking  advantage  of  fast  solution  techniques  developed  for  a 
rectangular  region.  (The  conformal  map  between  a  rectangle  and  the  disk 
is  an  easy  problem).  The  advantage  of  this  approach  is  having  a  fixed 
computational  domain  where  highly  specialized  numerical  methods  can  be 
used.  However,  the  necessary  calculation  of  the  map  can  often  be  diffi¬ 
cult  and  dominate  the  computational  cost.  In  some  applications  this  can 
be  considered  as  preprocessing  if  many  problems  of  the  form  (4.28) 
are  solved  for  a  fixed  domain. 

Assume  in  what  follows  that  s(z)  maps  the  given  rectangle  confor¬ 
mally  to  the  domain  ft  .  For  any  function  f(w)  w  e  ft  ,  let  f^(z) 

W  W 

denote  the  function  such  that  f^(z)  =  f(s(z}).  First  write  equation 
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(4.28)  as  two  coupled  second  order  equations, 


-  A  u  =  v  in  a 

w  w 

-  A  v  =  f  in  a 

w  w 

u  =  g  on  9fiw 

un  =  h  on  3nw 

The  equivalent  problem  in  the  computational  domain  is 


-Azu(S>  = 

Is' (2)1 

2  V(S) 

in 

R 

It 

to 

> 

N 

< 

1 

|s'(z)| 

2  f(s) 

in 

R 

u^  . 

g($) 

on 

3R 

u<*>  - 
n 

|s'(z)| 

h(s) 

on 

3R 

(4.29) 


(4.30) 


This  problem  can  be  discretized  using  the  standard  stencils  discussed  in 
Chapter  II.  Let  S  be  the  positive  diagonal  matrix  containing  the  dis¬ 
crete  values  of  |s'(z)|  on  the  gridpoints.  If  the  quadratic  extrapola¬ 
tion  scheme  is  used  near  the  boundary  the  discrete  matrix  problem  represent¬ 
ing  (4.30)  is 

(L  S'2  L  +  UUT)u^  =  h4  S2  fj^  +  l  (4.31) 

2 

where  L  is  the  discrete  Laplacian,  U  is  an  N  *  4(N-1)  matrix  and  l 
is  a  sparse  vector.  Notice  that  this  problem  has  the  same  structure  as  the 
problem  discussed  in  Chapter  III  except  that  the  diagonal  matrix  S  has 
been  introduced.  The  4(N-1)  x  4(N-1)  matrix 

B  =  (I  +  UT  L"1  S2  L"1  U)  (4.32) 

can  be  generated  and  factored  in  0(N3)  operations.  The  linear  system 
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can  also  be  solved  using  conjugate  gradients.  The  preconditioning  techni¬ 
que  employed  in  Chapter  III  cannot  be  used  in  this  case.  There  remains  to 
investigate  possible  alternatives.  When  S  =  I  the  eigenvalues  of  B 

approximately  equals  cN/i  for  i  =  1,2,3..  .  The  conjugate  gradient 
7 /3 

method  requires  0(N  )  arithmetic  operations  to  solve  (4.31)  for  a  pro¬ 

blem  with  such  a  spectrum. 

Finally  it  should  be  mentioned  that  this  technique  can  be  used  in  com¬ 
bination  with  a  Lanczos  eigenvalue  routine  when  solving  the  eigenvalue 
problem  associated  with  (4.28). 
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CHAPTER  V 
APPLICATIONS 

The  existence  of  an  efficient  numerical  method  for  solving  the  gen¬ 
eralized  biharmonic  equation  (4.1)  makes  it  a  useful  computational  tool 
in  the  construction  of  numerical  methods  for  more  complicated  fourth  or¬ 
der  problems  much  in  the  same  way  as  fast  Poisson  solvers  have  been  used 
in  the  past  ten  years. 

Problems  where  this  numerical  method  may  prove  useful  include  von 
KSrmSn's  equation  (1.5)  and  the  streamfunction  formulation  of  Navier 
Stokes  equation  for  incompressible  flow  (Temam  ^1977 J ) .  A  class  of 
problems  closely  related  to  (4.1),  including  physical  examples,  is  dis¬ 
cussed  by  A.  and  M.B.  Banerjee,  Roy  and  Gupta  ^1978].  In  some  of  these 
applications  a  nonuniform  (graded)  mesh  may  be  advantageous.  Extension 
of  the  numerical  method  to  more  general  fourth  order  equations  having  sep¬ 
arable  lower  order  terms,  is  not  difficult  and  this  makes  it  possible  to 
handle  certain  coordinate  transformations  introducing  such  meshes. 

Two  applications  using  the  numerical  methods  for  equation  (4.1)  will 
be  briefly  discussed  in  the  remainder  of  this  Chapter. 

5.1  The  eigenvalue  problem  for  the  biharmonic  operator. 

Consider  the  eigenvalue  problem 


=  A2U 

in 

R 

u  =  0 

on 

3R 

(5.1) 

un  =  0 

on 

9R  , 

in  a  rectangle  R.  This  problem  defines  the  natural  frequencies  and  the 
natural  modes  of  vibration  of  a  clamped  elastic  plate.  Formulation  (4.1) 
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can  be  used  with  a  =  0,  in  a  Rayleigh  quotient  iteration  when  computing 
approximations  to  the  lowest  modes.  Due  to  the  cubic  rate  of  convergence, 
this  method  is  more  efficient  than  some  of  the  previous  methods  that  have 
been  tried.  (Bauer  and  Reiss  [l972 J ) .  A  modified  form  of  algorithm  4.3 
was  used  when  solving  the  resulting  sequence  of  indefinite  problems. 

The  Fourier  transforms  in  the  algorithm  can  be  omitted  when  doing  this 
iteration.  After  the  transformed  eigenvectors  have  converged,  they  may 
be  Fourier  transformed  in  a  postprocessing  stage  resulting  in  a  substan¬ 
tial  savings  in  computational  work. 

This  section  briefly  describes  the  relationship  between  the  numerical 
method  and  the  space  of  eigenfunctions.  In  addition,  the  behavior  of  the 
first  eigenfunction  near  a  corner  is  studied. 

A  particular  eigenfunction  is  generated  by  only  one  (or  in  the  de¬ 
generate  case  by  two)  of  the  matrices  Crs  given  in  theorem  3.1.  The  no¬ 
tation  (r,s)  r,s  =  1,2,  will  be  used  to  indicate  which  of  the  four  pro¬ 
blems  in  step  4  of  algorithm  4.3  that  /nust  be  solved  for  a  given  eigen¬ 
value.  This  results  in  additional  computational  savings  and  also  pro¬ 
vides  a  more  systematic  way  of  studying  the  eigenfunctions.  Figure  5.1 
lists  the  five  first  distinct  eigenvalues  obtained  by  extrapolating  from 
solutions  using  N  =  63  and  N  =  127.  More  accurate  calculations  can 
easily  be  performed  by  going  to  finer  grids  or  even  better,  by  using  the 
matrices  given  in  lemma  A2.2.  Good  upper  and  lower  bounds  have  been 
published  by  Fichera  [l966]  and  they  are  included  in  figure  5.1. 


Lower  bound 

Estimate 

Upper  bound 

A1 

35.9852 

35.9852 

35.9852 

A2,3 

78.3922 

73.3937 

73.3939 

A4 

108.213 

108.216 

108.217 

A5 

131.573 

131.580 

131.581 

X6 

132.197 

132.220 

132.220 

Figure  5.1  The  first  five  distinct  eigenvalues 
computed  using  N=63  and  N=127  com¬ 
pared  with  lower  and  upper  bounds. 

In  these  calculations  R  was  taken  to  be  the  unit  square  and  X  is  de¬ 
fined  by  (5.1).  In  order  to  relate  the  decomposition  of  the  eigenspace 
to  the  symmetries  of  the  eigenfunctions  and  the  previous  work  of  Fichera, 
let  (x,y)  be  a  point  in  the  first  quadrangle  of  the  square  with  x  >  y. 
Consider  the  eight  points  pi  *  {(x,y),  (y,x),  (-y,x),  (-x,y),  (-x,-y), 
(-y>-x),  (y,-x)},  i  =  1,2, ...8  defining  the  possible  symmetries  of  a 
solution  defined  on  R.  The  following  relationships  hold: 

i)  The  eigenfunctions  with  total  symmetry,  u(pi)  =  u(f>j  ) 
for  i  =  2, 3.. 8,  are  generated  by  C^,  this  group 
corresponds  to  (0000)  in  Fichera' s  notation, 
ii)  The  eigenfunctions  symmetric  around  the  coordinates  axis, 
but  antisymmetric  around  the  diagonals,  u(pi)  =  u(p^) 
i  =  4,5  and  8,  u (p ^ )  =  -  u(p^),  i  =  2,3,6  and  7,  are 
also  generated  by  C^,  this  group  corresponds  to 


(0011)  in  Fichera's  notation. 
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iii)  The  eigenfunctions  which  are  antisymmetric  under  a  rotation 

of  it,  u(pi+4)  =  -  ufp^},  i  =  1,2,3  and  4,  are  generated 

12  21 

by  the  two  matrices  C  and  C  .  This  is  a  degenerate 
case  and  for  each  eigenvalue  in  this  group  there  are  two 
eigenfunctions.  The  two  eigenfunctions  have  the  same  shape, 
but  one  is  rotated  tt/2  compared  to  the  other.  Fichera 
calls  this  case  (01-10) . 

iv)  A  total  antisymmetric  eigenfunction,  u(pi+j)  =  -  u(p^), 

22 

i  *  1,2, 3... 8  is  generated  by  C  ,  corresponding  to 
(1111)  in  Fichera's  notation. 

v)  An  eigenfunction  symmetric  around  the  diagonals,  but  anti¬ 
symmetric  around  the  coordinate  a*is,  u(p.)  =  u(p^), 

i  =  2,5  and  6,  u(p^)  =  -  u(p^),  i  =  3,4,7  and  8,  is  also 
22 

generated  by  C  ,  this  group  is  called  (1100)  by 
Fichera. 

These  results  are  important  in  order  to  understand  which  of  the  linear  sys¬ 
tems  in  step  4  of  algorithm  4.2  that  are  nontrivial  for  a  problem  with  a 
given  symmetry.  (See  Appendix  III.)  The  results  also  indicate  that  it 
may  be  possible  to  refine  the  decomposition  given  in  theorem  3.1,  by  fur¬ 
ther  splitting  the  matrices  Crs. 

Next,  consider  the  shape  of  the  first  eigenfunction  in  the  neighbor¬ 
hood  of  a  corner.  Bauer  and  Reiss  ^1972j  reported  the  existence  of  nodal 
lines  in  the  vicinity  of  corners,  but  their  numerical  method  severely 
limited  a  detailed  study.  Other  researchers  noticed  that  the  nodal  line 
moved  towards  the  corner  as  the  grid  was  refined,  and  questioned  its  ex¬ 
istence  in  the  limit.  Theoretically  this  had  been  an  open  question  for 
quite  some  time.  (Very  recently,  after  this  investigation  was  completed. 


Coffman  ^1980j  informed  the  author  that  he  had  proved  the  existence  of 
nodal  lines.) 

The  fine  grids  permitted  by  the  new  numerical  method  made  it  possible 
to  study  this  question  numerically.  The  theory  in  Chapter  III  and  Appen¬ 
dix  II  may  also  be  used  to  investigate  this  phenomenon  in  the  continuous 
case.  Figures  5.2  and  5.3  show  contour  plots  of  the  first  eigenfunction  near 
a  corner  of  the  unit  square  based  on  calculations  using  N=127  and  N=255. 
Figure  5.4  shows  a  surface  plot  of  the  same  area  based  on  the  finest  grid. 
Finally,  after  normalizing  the  eigenfunction  such  that  its  maximum  value 
is  1,  the  extrapolated  values  based  on  the  two  grids,  are  shown  in 
figure  5.5. 


N  =  127 

Figure  5.2  Contour  plots  of  the  first  biharmonic 
eigenfunction  near  a  corner. 


Figure  5.3  Contour  plots  of  the  first  bi harmonic 
eigenfunction  near  a  corner. 


Figure  5,4  Surface  plot  of  the  first  biharmonic 
eigenfunction  near  a  corner. 
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t 

5/128 

-4.74 

-6.56 

+8.64 

+50.37 

+124.71  ] 

4/128 

-5.78 

-13.09 

i  -10.65 

+8.94 

1 

+50.37  1 

3/128 

-5.12 

-13.40 

-16.96 

-10.65 

1 

+8.64  [ 

2/128 

-3.21 

-9.16 

-13.40 

-13.09 

1 

-6.56  | 

I 

1/128 

-1.00 

-3.21 

-5.12 

-5.78 

-4.74 

- — t ) 

1 

x=l/128 

x=2/128 

x=3/128 

x=4/128 

x=5/128 

Figure  5.5  Extrapolated  values  of  the  eigenfunction 
near  a  corner.  The  numbers  are  scaled 
up  by  a  factor  106. 

5.2  Navier  Stokes  equation. 

As  an  illustration  of  a  problem  where  a  numerical  method  for  (4.1) 
with  nonzero  parameter  a  can  be  used,  consider  the  driven  cavity  model 
problem  for  the  nonlinear,  time  dependent  Navier  Stokes  equation.  In¬ 
troducing  a  stream  function  ¥  in  the  usual  way,  the  equation  was  solved 
using  the  following  scheme: 

Ak+i  -  !k4Vi  ■  R<yi,x  -  Wk  -  •  <5-2» 

Here  k  denotes  the  current  time  level  and  At  the  time  discretization 
step.  The  equation  was  discretized  in  space  using  second  order  accurate 
centered  differences  and  the  13-point  approximation  with  quadratic  boun¬ 
dary  extrapolation  was  used  to  approximate  the  biharmonic  operator.  No¬ 
tice  that  this  is  a  special  case  of  (4.1)  with  nonzero  a  and  6  =  0. 

The  problem  was  solved  in  a  square  region  with  Reynold's  number  R  =  200 
and  boundary  conditions  ¥  =  0  and  ¥n  =  0  except  at  the  side  y  =  1 
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where 


-  sint  0  <  t  <  tt/2 

fn  = 

-  1  tt/2  <  t  <  5  . 


This  corresponds  to  an  acceleration  of  the  moving  wall  up  to  the  standard 
velocity  used  in  stationary  calculations.  A  31+31  grid  was  used  and 
500  timesteps  each  of  length  0.01  was  taken.  (This  is  smaller  than  re¬ 
quired  for  stability  with  this  Reynold's  number.)  The  execution  time 
for  this  problem  was  approximately  one  minute  on  an  IBM  370/168.  The 
velocity  fields  are  shown  in  figures  5.6  and  5.7  at  two  different  times. 


o.o  t.o 

Velocity  field  t  =  1.5 


Figure  5.6  Discrete  solution  of  the  time  dependent 

Navier  Stokes  equation  at  Reynold's  number  200. 
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The  corresponding  streamfunctions  are  contour  plotted  in  figures  5.8  and 
5.9.  The  flow  is  not  stationary  at  time  5,  but  changes  very  slowly  into 
a  final  state  after  a  time  equal  20  with  a  main  vortex  center  T  =  0.105 
at  coordinates  x  =  0.41,  y  =  0.66  in  good  agreement  with  stationary 
calculations  with  this  grid. 


Figure  5.7  Discrete  solution  of  the  time  dependent  Navier 
Stokes  equation  at  Reynold's  number  200. 


Stream  function  t  =  5.0 


Figure  5.9  Discrete  solution  of  the  time  dependent  Navier 
Stokes  equation  at  Reynold's  number  200. 

It  should  be  mentioned  that  the  difference  scheme  (5.2)  is  unsatisfactory 
for  large  Reynold's  number,  due  to  the  fact  that  the  nonlinear  term  is 
handled  in  a  fully  explicit  way.  Computational  experience  indicates  that 
the  numerical  methods  developed  in  this  thesis,  for  the  biharmonic  equa¬ 
tion,  can  be  used  as  a  part  of  more  sophisticated  methods  when  solving 
the  stationary  driven  cavity  problem  at  large  Reynold's  number. 
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APPENDIX  I 

Proof  of  Theorem  3.1 

The  notation  used  is  consistent  with  the  notation  in  Chapter  III. 
The  development  in  this  section  is  very  similar  to  the  derivation  of 
(3.16)  and  (3.17),  but  individual  components  are  defined  instead  of  sub¬ 
matrices.  An  explicit  representation  for  the  quantity  QN  Ti  QN  in 
Theorem  3.1  is  needed. 

QN  Ti  QN  =  *N  +  8M  .  ,  S1n2  fe  QN  SkX  QN  1=1,2  ' 

K  1  j  I  '  t  y  •  • 

Let  Ak  =  QN  Sk~  % 

"  hlllH  -  Mr2  +  2  KI  ^  S)'1  KI  '‘'k'1) 

where  KN  =  QN  UN  . 

Define  the  2x2  matrix 


\  -  h  * 2  ’■k1 

ak+bk  ak“bk 

=  + 

ak~bk  ak+bk 


where 

ak 


h  8» 


N 

Z 


sin 


and 


b 


k 


*4  8, 


N 

Z 

j-2.4.6, 


sin 


It  is  clear  that  any  2*2  linear  system 


V 


=  r 


can  be  solved  in  exactly  the  same  way  as  described  in  Chapter  III  for  the 
block  case.  Recalling  the  definition  of  a£s  in  Theorem  3.1,  this 
leads  to 

h  *  H  ■  <ri*r2>/“k" 

Z1  "  z2  =  (rl'r2)/ctkN  ' 

Let  e-  be  the  j'th  unit  vector  of  dimension  N  and  consider  the  j'th 

J 

k 

column  a.  of  A.  .  Observe  that 
J  K 

kn  ’k1  ej  =‘\^rsin  fe-’kjfi)  if  j  ^  odd 

r  — 

■"VsTt  sin  ifjlseven. 


■■yw 


cin  NlT 

sin  n+t 


Zl+Z2 

W 

Zl+Z2 


Using  the  above  expressions  in  the  definition  of  gives  the  following 

i/ 

expressions  for  element  a-^  .: 

au  s  »kj  si0  -  Sn  s(n  srr  s1"  fe  <  »ki  fkj)  '-J  both  odd  °r 

i,j  both  even. 


aij  *  0  • 


i  odd,  j  even,  or 
i  even,  j  odd. 


PN  Ak  PN  =  PN  PN  "  &N 


pi  o 

rk  u 


0  F 


Therefore 
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where  f£  has  elements  of  the  form 


1  IT  J  TT 

,,r.  .  sin  n^t  sin  m 

'  Vij  " 


rN 


This  finally  gives 


P  n-*5  O  T  0  D”*5  p"'”  = 
N  ui  gN  i  gN  ui  kN 


1  ■  sn  6m  pn  °i*  P5  k=i  ^+2>i  sin2  m 


r  =  1,2 


F*  o 
k  u 


0  F 


P  D-*5  P^  i 
kN  ui  kN  1 


Comparing  this  expression  componentwise  with  the  matrix 
I  -  (C1r)T  Cir  i  *  1,2  r  =  1,2 
in  Theorem  3.1  concludes  the  proof.  Q 
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APPENDIX  II 


Proof  of  Theorem  3.2  and  Theorem  3.3. 


Lemma  A2.1 


If  SN  =  ,E 


J  sin 

j=l  (B-cos 


B  >  1 


S  =2(N+i)/-4- 

N  \  1-a2  l-(aN+1)2  l-(aN+1)2  1-a2 


where  a  =  B  -  VB  -1  . 


0  <  a  <  1 


-  Vb^T 


Proof : 


Define  f(x)  =  4  a 


2 

2  sin  x 


(1+a  -2a  cos  x)* 


Fettis  ^ 1 97 9J  pointed  out  that  the  application  of  Poisson's  summation 
formula  to  this  function  gives  the  relation 


Hf(0) 


+  +  +  *  f(7I>  =  [F0  +  2  Jj  F2k(N+l)] 


where 


r* 

F  =  I  f (x)  cos  mx  dx 

m  Jo 


is  a  cosine  transform  of  f(x)  (Magnus  and  Oberhettinger  [l9*8,  p.  217 


Integration  by  parts  yields 


Fm  =  4  a 

m 


(1+a  -2a  cos  x)‘ 


sin  x  cos  mx  dx 


=  2  a 


(1+a  -2a  cos  x) 


dx  -  2  a  m  f  -S»1, PJl  sin  PUL-  dx 

Jn  M+a  rnc  vi 


'0  (1+a  -2a  cos  x) 
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These  integrals  are  well  known  and  can  be  found  in  Gradshteyn  and  Ryzhik 
[l965,  p.  366-367]  . 


COS  X 


(1+a  -2a  cos  x) 


dx 


cos  x  cos  mx  j 

0  X 

(1+a  -2a  cos  x) 


sin  x  sin  mx 
(l+a2-2a  cos  x) 


dx 


m-1  1+a 


2 


tt  m-1 
2  a 


a2<  1 


m=0 


a2  <  1  m=l,2,3 


a2  <  1  m=l,2,3  . 


Therefore 


2n  a 
0  "  1-a2 


Fn 

=  tt  a 
m 


lai-J 

1-a2  J 


1)2)3*  •  •  ) 


and 


SN  =  2(N+1)  [l  +  Z  (a2(N+1))k  -  2(N+1)  £  k(a2(N+1) )kl 

1-a*  L  a*  k=l  a*  k=l  J 


=  2(n+i)  -  it4)l  •  □ 

[l-a2  l-(aN  X)Z  l-(aN+1)2  1-a2  J 


Remark: 


SN  =  2(N+1) 


a  +0(a2(N+1>) 


1-a 
__B _ 


=  (N+l  )(-4=r-  1)  +  0(a2(N+1)) 


This  is  the  approximation  obtained  if  is  approximated  by 
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All  the  error  terms  in  the  Euler-McLaurin  formula  (Dahlquist  and  Bjorck 
[l974,  p.  297])  are  zero  in  this  case,  since  f^(0)  =  f^(ir)  =  0  for 
k  odd.  This  much  simpler  expression  is  always  an  upper  bound  for  SN 
since 


Now  consider 
S 

Claim: 


even 


N 

I 


sin 


2  jir 


j-2,4,6  (B  -  cos  fay 


B  >  1  ,  N  odd  . 


Seven  =  <N+1> 


N+l 


1-a2  l-aN+1 


Proof : 


N+l 


Replace  N+l  by  in  the  proof  of  Lemma  A2.1.  It  is  easily 

seen  that  the  proof  still  holds.  Q 

The  sum  over  only  odd  j  can  now  be  found  as  the  difference  between 

Sw  and  S-w„ . 

N  even 

The  above  derivation  furnishes  a  closed  form  expression  for  the 
iN 

quantity  ak  defined  in  Theorem  3.1  and  therefore  closed  form  expres¬ 
sions  for  the  individual  matrix  elements  c^  also  given  in  Theorem 
3.1. 


An 


upper  bound  for  the  largest  singular  value  of  the  matrices 
Crs  will  be  derived.  The  following  well  known  inequality  will  be  used: 

i  (  ll(Crs)T  Crs|lii  (  !|Crs||j  l|Crs  .  [(mj,  r  Cj?)(m^x  I  c”)]* 
since  all  matrix  elements  are  positive. 
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For  fixed  i  and  j  an  element  c1^  increases  with  the  dimension  N  of 
the  matrix.  The  interesting  case  to  consider  is  the  limiting  behavior 
as  N  becomes  large.  The  following  important  Lemma  gives  the  precise 
form  of  the  limit  matrix. 

Lenwa  A2.2  The  matrix  Crs  defined  in  Theorem  3.1  has  elements: 

.rs  _  8  V^Vyl  »f  ,  ■  ,  1  .  r-1.2 

'j  ti  { ir^+Js2)2  b'J5  bj1-  (N+l)2  s=1’2 


1  J 


where  a*]s  and  b*Js  are  exponentially  close  to  1  in  j  and  given  by 

J  J 


and 


krc 

(l+(-l)  rS 

“jy.1T 

e  r  ) 

■>?- 

kr? 

(l+2(-l)  rs 

“jy.1T 

Jrre 

-2  j, 
-  e 

7T 

s 

</) 

ii 

0  if  r  = 

s  =  1  or 

r  =  2 

k  = 
rs 

1  if  r  = 

s  =  2  or 

r  *  1 

ii 

s- 

•o 

2j  -  1  if 

r  =  1 

Jr  =  2j  if  r  =  2  . 

Proof : 

Derive  Taylor  expansions  for  each  element  cr!  in  the  variable 

1 

around  zero.  This  is  rather  tedious  to  do  by  hand  and  the  symbolic 
manipulation  program  MACSYMA  [l977]  was  used  when  deriving  the  above 
expressions.  Q 

The  3  by  3  leading  principal  minors  of  the  (infinite)  limit  matrix 

C  are  compared  with  the  corresponding  minors  of  for  N=63  in 
00  oJ 

Figure  A2.1.  It  is  interesting  to  observe  that  the  approximation  is  quite 
good  already  for  this  value  of  N. 

-«V 
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Figure  A2.1.  Leading  principal  minors  of  C.,  for  N  =  63  and  N 
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In  order  to  obtain  upper  bounds  for  the  row  and  column  sums  of  C^s  the 
following  Lemma  is  needed: 

Lemma  A2.3 
Let 


r  -  VJrVVV 

Si  =  1  — 7~  $ 

j=i  (K  +  j l) 


7t  t  m  r=1  or  2 


r  “r ' 


for  some  given  i  e  (1,2,3...). 
Then 


1V2  _  1  25  ,3v3/4  <-r  ttY?  1  25  ,3, 
TT  i  32  (5}  -  Si  -16  +  T  32  {5} 


3/4 


for  all  i  and  r=l  or  2. 
Proof: 


s.=  ;  ;  _u^i3/2 


1  j-l  (i2+j2)2  1  j-1  (l+(j/i )2) 


Let 


_  x 


3/2 


f(x)  * - *-*■ 

(1+x2)2 


max 


=  f(\/|)  ■  i  (|) 


3/4 


0  <  x  <  00 


/; 


f (x)dx 


=  iYI 


Clearly 


lim  S.  =  f  f (x )dx  =  . 

i-*»  1  Jq  8 


By  considering  the  discrete  sum  for  finite  i  it  follows  that 

2$-  -if  <  S.  <  2^-  +  i  f 

1  max  -  1  -  8  i  max 


Doing  the  same  analysis  for  the  even  sum  S 


even 


# 
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S 


even 


1 

i 


E 

j=2,4, . . 


(Q7i)3/? 
(l+j/i )2)2 


results  in 


2 

16 


•I  g  <  S 

i  3max  —  even 


where  the  appropriate  function  is 


g(x) 


2VF  x3/2 
(1+4  x2)2 


g(x)dx  = 


i V7 

16 


and 


3max 


=  g 


Combining  these  two  results  proves  the  Lemma.  D 

It  is  now  easy  to  prove  Theorem  3.2.  As  can  be  easily  verified,  the 
row  sum 

"  11 

E  CH  <  .75853 
j=l 

is  larger  than  any  other  bound  that  can  be  obtained  for  small  i  (say 
i  <  20).  Lemma  A2.3  shows  that  this  value  certainly  cannot  be  exceeded 
with  any  larger  i.  (The  factors  a^s  and  b^s  are  exponentially  small 
in  i  and  j  and  present  no  difficulties.  □ 

Remark: 

Computations  confirm  that  the  maximum  singular  value  belongs  to  C^. 

21  12 

The  upper  bound  using  the  matrix  C  or  C  is 


f(max  1  c}2)(max  e  <.  -?43  . 

1  i  j=l  1J  i  j=l  1J  1 

22 

The  bound  for  the  matrix  C  is  given  by 


max  1  C2^  <  i  VF  (corresponds  to  i=®  in  Lemma  A2.3). 
i  j=l  1J  “  1 
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Note  that  the  resulting  theory  also  provides  lower  bounds  for  the  largest 
singular  values.  Since  the  matrices  are  positive,  the  smallest  row  or 
column  sum  will  be  such  a  bound.  (Varga  [l962,  p.  3lj).  In  particular, 
computations  Indicate  that  amax  >  .706. 

The  analysis  gives  an  explicit  representation  for  the  continuous 
biharmonic  operator  in  a  rectangular  region.  This  representation  can  be 
used  to  study  properties  of  the  biharmonic  operator  in  the  given  geometry. 
Finally  consider  Theorem  3.3.  An  upper  bound  for  the  sum  of  the 

rc  11 

singular  values  of  the  matrices  C  is  needed.  Consider  the  matrix  C  . 
Since  is  symmetric  it  is  sufficient  to  look  at  its  trace. 


N 

E 

i=l 


a11 

2  1  ,ai  ,2  <  1, 

„l?T7(^TT> 


N 

E 


+  In  N  +  6  + 


<4» 


where  y  is  Euler's  constant,  y  =  .5772...  and  6  is  the  contribution 
from  the  small  term  a^/b^.  Letting  N  this  shows  that  the  con¬ 

stant  in  front  of  the  In  N  term  in  Theorem  3.3  (taken  equal  to  1  there) 

tends  to  ^  as  N  becomes  large.  A  similar  argument  gives  the  same  re- 
22 

suit  for  C  .  It  is  an  obvious  conjecture  that  this  result  is  true  also 
12 

for  C  ,  but  since  it  is  of  little  importance  in  this  context  the  weaker 
statement  in  Theorem  3.3  is  given  instead.  This  can  be  proved  by  consi¬ 
dering  E  (cH)2  (the  Frobenius  norm  of  C*2).  [3 
ij 

Figure  A2.2  shows  the  computed  sum  of  the  singular  values  normalized 

1  9  T 

by  the  factor  for  the  three  cases  of  interest,  (C“  =  (C“)'). 


.  V* 
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APPENDIX  III 

Test  problems  used  in  the  examples. 

The  following  list  defines  the  test  problems  refered  to  by  number 
in  the  main  body  of  this  dissertation.  The  solution  u(x,y)  is  given. 

In  addition,  the  subproblems  in  step  4  of  algorithm  4.2,  that  are  nontri¬ 
vial  when  solving  these  problems  on  the  square  0  <  x,y  <  1  are  listed, 
using  the  notation  from  section  5.1.  This  is  of  importance  when  consi¬ 
dering  the  results  of  section  2.3.  It  also  determines  the  work  required 
to  find  the  solution. 

1.  u  ■  xy{l-x)(l-y) 

Subproblem:  (1,1). 

Comments:  This  problem  is  frequently  used  since  the 
truncation  error  is  zero. 

This  problem  has  also  been  used  by  Ehrlich 
and  Gupta  [l975]. 

2.  u  =  x2 3  +  y2  -  x  ex  cosy 

Subproblems:  (1,1),  (1,2),  (2,1)  and  (2,2) 

Comments:  This  problem  was  considered  by  Gupta  and 
Manohar  [l979j  ,  and  in  a  slightly  modified 
form  by  Ehrlich  and  Gupta  {l975j. 

3.  u  *  2xy  +  x2  -  3y2 

Subproblems:  (1,1),  (1,2),  (2,1)  and  (2,2). 

Comments:  The  problem  has  zero  truncation  error  if 
the  cubic  boundary  approximation  is  used. 

It  has  been  considered  by  Greenspan  and 
Schultz  [l972j  and  by  Gupta  and  Manohar  [l979] . 


u  =  xy  (1-xJ  (1-y) 

Subproblem:  (1,1). 

Comments:  This  solution  is  simply  problem  1  squared. 

It  was  used  by  Bauer  and  Reiss  [l972j  and 
by  Gupta  and  Manohar  [l979j . 


u  =  (1-cos  2ttx)(  1-cos  2iry) 

Subproblem:  (1,1) 

Comments:  Another  symmetric  problem  considered  by 
Bauer  and  Reiss  ^1972j  and  by  Gupta  and 
Manohar  |1979j. 

u  =  ex  sinx  +  e^cosy 

Subproblems:  (1,1),  (1,2)  and  (2,1). 

Comments:  This  solution  is  the  sum  of  two  functions 
each  depending  on  only  one  variable.  No¬ 
tice  that  only  three  subproblems  are  needed. 

u  =  x3log(l+y)  +  y/(l+x) 

Subproblems:  (1,1),  (1,2),  (2,1)  and  (2,2). 

Comments:  This  is  a  good  general  problem. 

u  =  sirnrx 

Subproblems:  (1,1)  and  (2,1). 

Comments:  This  problem  falls  in  between  the  highly 
symmetric  and  the  general  problems. 


u  =  cosrx  cos^y 
Subproblems:  (2,2) , 

Comments:  A  highly  symmetric  problem. 


PT-r  — - 
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8  8-p  „  i  „  i 

10.  u  •  L  L  pyp_1  xq_1 

I  p=l  q=l 

Subproblems:  (1,1),  (1,2),  (2,1)  and  (2,2). 

Comments:  This  problem  is  constructed  in  order  to  have 

a  problem  where  all  Taylor  coefficients  are 

nonzero  up  to  a  total  degree  of  seven. 
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